290 Commits

Author SHA1 Message Date
hev
0d17e1f0e5
[LoongArch] Revert sp adjustment in prologue (#88110)
After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for
FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we
can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency
spill slot for GPR for large frames") to generate better code, as the
issue with `RegScavenger` has been resolved.

Fixes #88109
2024-04-10 17:13:25 +08:00
Craig Topper
acab142751 [LegalizeDAG] Freeze index when converting insert_elt/insert_subvector to load/store on stack.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.

We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.

Fixes #86717
2024-03-27 13:01:23 -07:00
Lu Weining
e4edbae0aa
Revert "[llvm][LoongArch] Improve loongarch_lasx_xvpermi_q instrinsic" (#84708)
Reverts llvm/llvm-project#82984

See the discussion in https://github.com/llvm/llvm-project/pull/83540.
2024-03-13 11:51:47 +08:00
wanglei
edd4c6c6dc
[LoongArch] Make sure that the LoongArchISD::BSTRINS node uses the correct MSB value (#84454)
The `MSB` must not be greater than `GRLen`. Without this patch, newly
added test cases will crash with LoongArch32, resulting in a 'cannot
select' error.
2024-03-11 08:59:17 +08:00
wanglei
a5c90e48b6
[LoongArch] Switch to the Machine Scheduler (#83759)
The SelectionDAG scheduling preference now becomes source order
scheduling (machine scheduler generates better code -- even without
there being a machine model defined for LoongArch yet).

Most of the test changes are trivial instruction reorderings and
differing register allocations, without any obvious performance impact.

This is similar to commit: 3d0fbafd0bce43bb9106230a45d1130f7a40e5ec
2024-03-05 09:15:44 +08:00
Lu Weining
5f058aa211
[LoongArch] Override LoongArchTargetLowering::getExtendForAtomicCmpSwapArg (#83656)
This patch aims to solve Firefox issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=1882301

Similar to 616289ed2922. Currently LoongArch uses an ll.[wd]/sc.[wd]
loop for ATOMIC_CMP_XCHG. Because the comparison in the loop is
full-width (i.e. the `bne` instruction), we must sign extend the input
comparsion argument.

Note that LoongArch ISA manual V1.1 has introduced compare-and-swap
instructions. We would change the implementation (return `ANY_EXTEND`)
when we support them.
2024-03-04 08:38:52 +08:00
leecheechen
d7c80bba69
[llvm][LoongArch] Improve loongarch_lasx_xvpermi_q instrinsic (#82984)
For instruction xvpermi.q, only [1:0] and [5:4] bits of operands[3] are
used. The unused bits in operands[3] need to be set to 0 to avoid
causing undefined behavior.
2024-02-27 15:38:11 +08:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
hev
8be39b3901
[LoongArch] Improve pattern matching for AddLike predicate (#82767)
This commit updates the pattern matching logic for the `AddLike`
predicate in `LoongArchInstrInfo.td` to use the
`isBaseWithConstantOffset` function provided by `CurDAG`. This
optimization aims to improve the efficiency of pattern matching by
identifying cases where the operation can be represented as a base
address plus a constant offset, which can lead to more efficient code
generation.
2024-02-26 11:13:21 +08:00
hev
c747b24262
[NFC] Precommit a memcpy test for isOrEquivalentToAdd (#82758) 2024-02-23 21:43:53 +08:00
hev
dd3e0a4643
[LoongArch] Assume no-op addrspacecasts by default (#82332)
This PR indicates that `addrspacecasts` are always no-ops on LoongArch.

Fixes #82330
2024-02-21 21:15:17 +08:00
DianQK
ccb46e8365
Reapply "[RegisterCoalescer] Clear instructions not recorded in ErasedInstrs but erased (#79820)"
This reverts commit 8316bf34ac21117f35bc8e6fafa2b3e7da75e1d5.
2024-02-09 15:58:48 +08:00
DianQK
8316bf34ac
Revert "[RegisterCoalescer] Clear instructions not recorded in ErasedInstrs but erased (#79820)"
This reverts commit 95b14da678f4670283240ef4cf60f3a39bed97b4.
2024-02-09 15:54:54 +08:00
Quentin Dian
95b14da678
[RegisterCoalescer] Clear instructions not recorded in ErasedInstrs but erased (#79820)
Fixes #79718. Fixes #71178.

The same instructions may exist in an iteration. We cannot immediately
delete instructions in `ErasedInstrs`.
2024-02-09 15:29:05 +08:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
yjijd
44ba6ebc99
[CodeGen][LoongArch] Set FP_TO_SINT/FP_TO_UINT to legal for vector types (#79107)
Support the following conversions:
v4f32->v4i32, v2f64->v2i64(LSX)
v8f32->v8i32, v4f64->v4i64(LASX)
v4f32->v4i64, v4f64->v4i32(LASX)
2024-01-23 15:57:06 +08:00
yjijd
f799f93692
[CodeGen][LoongArch] Set SINT_TO_FP/UINT_TO_FP to legal for vector types (#78924)
Support the following conversions:
v4i32->v4f32, v2i64->v2f64(LSX)
v8i32->v8f32, v4i64->v4f64(LASX)
v4i32->v4f64, v4i64->v4f32(LASX)
2024-01-23 15:16:23 +08:00
Ami-zhang
fcb8342a21
[LoongArch] Add definitions and feature 'frecipe' for FP approximation intrinsics/builtins (#78962)
This PR adds definitions and 'frecipe' feature for FP approximation
intrinsics/builtins. In additions, this adds and complements relative
testcases.
2024-01-23 14:24:58 +08:00
Fangrui Song
7620f03ef7
[MC] Parse SHF_LINK_ORDER argument before section group name (#77407)
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381
https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.

This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.

While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
2024-01-09 10:42:34 -08:00
wanglei
98c6aa7229
[LoongArch] Implement LoongArchRegisterInfo::canRealignStack() (#76913)
This patch fixes the crash issue in the test:
CodeGen/LoongArch/can-not-realign-stack.ll

Register allocator may spill virtual registers to the stack, which    
introduces stack alignment requirements (when the size of spilled     
    registers exceeds the default alignment size of the stack). If a  
function does not have stack alignment requirements before register   
allocation, registers used for stack alignment will not be preserved. 

Therefore, we should implement `canRealignStack()` to inform the      
register allocator whether it is allowed to perform stack realignment 
operations.
2024-01-09 20:35:49 +08:00
wanglei
f499472de3 [LoongArch] Pre-commit test for #76913. NFC
This test will crash with expensive check.

Crash message:
```
*** Bad machine code: Using an undefined physical register ***
- function:    main
- basic block: %bb.0 entry (0x20fee70)
- instruction: $r3 = frame-destroy ADDI_D $r22, -288
- operand 1:   $r22
```
2024-01-09 20:32:20 +08:00
hev
16094cb629
[llvm][LoongArch] Support per-global code model attribute for LoongArch (#72079)
This patch gets the code model from global variable attribute if it has,
otherwise the target's will be used.

---------

Signed-off-by: WANG Rui <wangrui@loongson.cn>
2024-01-06 13:36:09 +08:00
wanglei
c56a5e895a [LoongArch] Reimplement the expansion of PseudoLA*_LARGE instructions (#76555)
According to the description of the psABI v2.30:
https://github.com/loongson/la-abi-specs/releases/tag/v2.30, moved the
expansion of relevant pseudo-instructions from
`LoongArchPreRAExpandPseudo` pass to `LoongArchExpandPseudo` pass, to
ensure that the code sequences of `PseudoLA*_LARGE` instructions and
Medium code model's function call are not scheduled.
2024-01-05 10:57:53 +08:00
wanglei
3d6fc35b90 [LoongArch] Pre-commit test for #76555. NFC 2024-01-05 10:57:40 +08:00
wanglei
2cf420d5b8 [LoongArch] Emit function call code sequence as PCADDU18I+JIRL in medium code model
According to the description of the psABI v2.20:
https://github.com/loongson/la-abi-specs/releases/tag/v2.20, adjustments
are made to the function call instructions under the medium code model.

At the same time, AsmParser has already supported parsing the call36 and
tail36 macro instructions.
2024-01-05 10:56:47 +08:00
wanglei
da5378e87e [LoongArch] Fix incorrect pattern [X]VBITSELI_B instructions
Adjusted the operand order of [X]VBITSELI_B to correctly match vselect.
2023-12-29 14:44:29 +08:00
wanglei
c7367f985e [LoongArch] Fix incorrect pattern XVREPL128VEI_{W/D} instructions
Remove the incorrect patterns for `XVREPL128VEI_{W/D}` instructions,
and add correct patterns for XVREPLVE0_{W/D} instructions
2023-12-29 14:03:53 +08:00
wanglei
47c88bcd5d [LoongArch] Fix LASX vector_extract codegen
Custom lowering `ISD::EXTRACT_VECTOR_ELT` with lasx.
2023-12-29 13:48:53 +08:00
wanglei
af999c4be9
[LoongArch] Add codegen support for [X]VF{MSUB/NMADD/NMSUB}.{S/D} instructions (#74819)
This is similar to single and double-precision floating-point
instructions.
2023-12-11 10:37:22 +08:00
wanglei
cdc3732566 [LoongArch] Mark ISD::FNEG as legal 2023-12-08 15:07:58 +08:00
wanglei
9f70e708a7
[LoongArch] Make ISD::FSQRT a legal operation with lsx/lasx feature (#74795)
And add some patterns:
1. (fdiv 1.0, vector)
2. (fdiv 1.0, (fsqrt vector))
2023-12-08 14:16:26 +08:00
wanglei
9ff7d0ebeb
[LoongArch] Add codegen support for icmp/fcmp with lsx/lasx fetaures (#74700)
Mark ISD::SETCC node as legal, and add handling for the vector types
condition codes.
2023-12-07 20:11:43 +08:00
wanglei
de21308f78 [LoongArch] Make ISD::VSELECT a legal operation with lsx/lasx 2023-12-06 16:43:38 +08:00
wanglei
e9cd197d15 [LoongArch] Support MULHS/MULHU with lsx/lasx
Mark MULHS/MULHU nodes as legal and adds the necessary patterns.
2023-12-04 10:58:05 +08:00
wanglei
a60a5421b6 Reland "[LoongArch] Support CTLZ with lsx/lasx"
This patch simultaneously adds tests for `CTPOP`.

This relands 07cec73dcd095035257eec1f213d273b10988130 with fix tests.
2023-12-02 17:22:40 +08:00
wanglei
63e6bba0c3 Revert "[LoongArch] Support CTLZ with lsx/lasx"
This reverts commit 07cec73dcd095035257eec1f213d273b10988130.
2023-12-02 17:17:48 +08:00
wanglei
07cec73dcd [LoongArch] Support CTLZ with lsx/lasx
This patch simultaneously adds tests for `CTPOP`.
2023-12-02 17:13:36 +08:00
wanglei
66a3e4fafb [LoongArch] Override TargetLowering::isShuffleMaskLegal
By default, `isShuffleMaskLegal` always returns true, which can result
 in the expansion of `BUILD_VECTOR` into a `VECTOR_SHUFFLE` node in
 certain situations. Subsequently, the `VECTOR_SHUFFLE` node is expanded
 again into a `BUILD_VECTOR`, leading to an infinite loop.
 To address this, we always return false, allowing the expansion of
 `BUILD_VECTOR` through the stack.
2023-12-02 14:25:17 +08:00
leecheechen
dbbc7c31c8
[LoongArch] Add some binary IR instructions testcases for LASX (#74031)
The IR instructions include:
- Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv
- Bitwise Binary Operations: shl lshr ashr
2023-12-01 13:14:11 +08:00
wanglei
ca66df3b02 [LoongArch] Add more and/or/xor patterns for vector types 2023-12-01 10:28:41 +08:00
wanglei
add224c0a0 [LoongArch] Custom lowering ISD::BUILD_VECTOR 2023-12-01 09:13:39 +08:00
wanglei
f2cbd1fdf7 [LoongArch] Add codegen support for insertelement 2023-12-01 09:13:39 +08:00
leecheechen
29a0f3ec2b
[LoongArch] Add some binary IR instructions testcases for LSX (#73929)
The IR instructions include:
- Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv
- Bitwise Binary Operations: shl lshr ashr
2023-11-30 21:41:18 +08:00
wanglei
b72456120f
[LoongArch] Add codegen support for extractelement (#73759)
Add codegen support for extractelement when enable `lsx` or `lasx`
feature.
2023-11-30 17:29:18 +08:00
wanglei
5e7e0d6032
[LoongArch] Fix pattern for FNMSUB_{S/D} instructions (#73742)
```
when a=c=-0.0, b=0.0:
-(a * b + (-c)) = -0.0
-a * b + c = 0.0
(fneg (fma a, b (-c))) != (fma (fneg a), b ,c)
```

See https://reviews.llvm.org/D90901 for a similar discussion on X86.
2023-11-29 15:21:21 +08:00
hev
0d9f557b6c
[LoongArch] Disable mulodi4 and muloti4 libcalls (#73199)
This library function only exists in compiler-rt not libgcc. So this
would fail to link unless we were linking with compiler-rt.

Fixes https://github.com/ClangBuiltLinux/linux/issues/1958
2023-11-23 19:34:50 +08:00
hev
7414c0db96
[LoongArch] Precommit a test for smul with overflow (NFC) (#73212) 2023-11-23 15:15:26 +08:00
ZhaoQi
775d2f3201
[LoongArch][MC] Support to get the FixupKind for BL (#72938)
Previously, bolt could not get FixupKind for BL correctly, because bolt
cannot get target-flags for BL. Here just add support in MCCodeEmitter.

Fixes https://github.com/llvm/llvm-project/pull/72826.
2023-11-21 19:00:29 +08:00
ZhaoQi
2ca028ce7c
[LoongArch][MC] Pre-commit tests for instr bl fixupkind testing (#72826)
This patch is used to test whether fixupkind for bl can be returned
correctly. When BL has target-flags(loongarch-call), there is no error.
But without this flag, an assertion error will appear. So the test is
just tagged as "Expectedly Failed" now until the following patch fix it.
2023-11-21 08:34:52 +08:00
Lu Weining
78abc45c44
[LoongArch] Improve codegen for atomic cmpxchg ops (#69339)
PR #67391 improved atomic codegen by handling memory ordering specified
by the `cmpxchg` instruction. An acquire barrier needs to be generated
when memory ordering includes an acquire operation. This PR improves the
codegen further by only handling the failure ordering.
2023-10-19 09:21:51 +08:00