515 Commits

Author SHA1 Message Date
tangaac
bf6d52a2dd
[LoongArch] Pre-commit for vecreduce_add. (#154302) 2025-08-20 09:12:47 +08:00
tangaac
ccbcebcfd3
[LoongArch] Fix implicit PesudoXVINSGR2VR error (#152432)
According to the instructions manual, when `vr0` is changed, high 128
bit of `xr0` is undefined.
Use `vinsgr2vr.b/h` to insert an `i8/i16` to low 128bit of a 256 vector
may cause undefined behavior when high 128bit is used in later
instructions.
2025-08-19 17:22:00 +08:00
ZhaoQi
be3fd6ae25
[LoongArch] Use section-relaxable check instead of relax feature from STI (#153792)
In some cases, such as using `lto` or `llc`, relax feature is not
available from this `SubtargetInfo` (`LoongArchAsmBackend` is
instantiated too early), causing loss of relocations.

This commit modifiy the condition to check whether the section which
contains the two symbols is relaxable. If not relaxable, no need to
record relocations.
2025-08-19 09:48:51 +08:00
ZhaoQi
3acb7093c2
[LoongArch][NFC] Add tests for fixing missed addsub relocs when enabling relax (#154108) 2025-08-19 09:15:23 +08:00
ZhaoQi
8f671a675f
[LoongArch] Always emit symbol-based relocations regardless of relaxation (#153943)
This commit changes all relocations to be relocated with symbols.

Without this commit, errors may occur in some cases, such as when using
`llc/lto+relax`, or combining relaxed and norelaxed object files using
`ld -r`.

Some tests updated.
2025-08-18 20:15:49 +08:00
ZhaoQi
6957e44d8e
[LoongArch][MC] Refine conditions for emitting ALIGN relocations (#153365)
According to the suggestions in
https://github.com/llvm/llvm-project/pull/150816, this commit refine the
conditions for emitting R_LARCH_ALIGN relocations.

Some existing tests are updated to avoid being affected by this
optimization. New tests are added to verify: removal of redundant ALIGN
relocations, ALIGN emitted after the first linker-relaxable instruction,
and conservatively emitted ALIGN in lower-numbered subsections.
2025-08-18 14:54:27 +08:00
tangaac
9315d701eb
[LoongArch] Optimize inserting extracted element for v4i64/v8i32 (#152629) 2025-08-14 17:06:50 +08:00
Trevor Gross
00c4be3c9e
[Test] Add and update tests for lrint/llrint (NFC) (#152662)
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:

* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls

There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
2025-08-12 09:56:51 +09:00
Qi Zhao
2f8e4f8b26 [LoongArch] Pre-commit tests for shuffle visiting same lane. NFC
PR: https://github.com/llvm/llvm-project/pull/151633.
2025-08-09 18:29:26 +08:00
tangaac
b05e26be8a
[LoongArch] Optimize extracting f32/f64 from 256-bit vector by using XVPICKVE. (#151914) 2025-08-06 09:11:34 +08:00
ZhaoQi
ece7a72aa2
[LoongArch] Optimize insertelement containing variable index using compare+select (#151131) 2025-07-30 18:06:41 +08:00
ZhaoQi
80e0d41677
[LoongArch] Custom legalizing build_vector with same constant elements (#150584) 2025-07-28 09:50:35 +08:00
ZhaoQi
f2a4cc1dd0
[LoongArch] Avoid expanding build_vector containing insertion of undef elements (#150377) 2025-07-26 14:24:39 +08:00
Qi Zhao
e3b5daf2db [LoongArch] Pre-commit tests for build_vector with same constant elements. NFC 2025-07-25 15:29:41 +08:00
Qi Zhao
afbf86e719 [LoongArch] Pre-commit tests for build_vector with undef elements inserting 2025-07-24 11:52:47 +08:00
ZhaoQi
ddf34b4c97
[LoongArch] Optimize general fp build_vector lowering (#149486) 2025-07-22 16:16:27 +08:00
ZhaoQi
cae7650558
[LoongArch] Optimize inserting fp element to vector (#149302)
Co-authored-by: tangaac <tangyan01@loongson.cn>
2025-07-22 13:38:46 +08:00
mintsuki
9ed8816dc6
LoongArch: Improve detection of valid TripleABI (#147952)
If the environment is considered to be the triple component as a whole,
so, including the object format, if any, and if that is the intended
behaviour, then the loongarch64 function `computeTargetABI()` should be
changed to not rely on `hasEnvironment()`, but, rather, to check if
there is a non-unknown environment set.

Without this change, using a (ideally valid) target of
loongarch64-unknown-none-elf, with a manually specified ABI of lp64s,
will result in a completely superfluous warning:

```
warning: triple-implied ABI conflicts with provided target-abi 'lp64s', using target-abi
```
2025-07-22 12:13:37 +08:00
hev
8a307ae619
[LoongArch] Fix failure to widen operand for [X]VMSK{LT,GE,NE}Z (#149442)
Reported-by: tangyan <tangyan01@loongson.cn>
2025-07-21 16:36:49 +08:00
tangaac
64a0478e08
[LoongArch] Strengthen stack size estimation for LSX/LASX extension (#146455)
This patch adds an emergency spill slot when ran out of registers.
PR #139201 introduces `vstelm` instructions with only 8-bit imm offset, 
it causes no spill slot to store the spill registers.
2025-07-18 16:12:11 +08:00
ZhaoQi
e74082703e
[LoongArch] Optimize inserting bitcasted integer element or bitcasting extracted fp element (#147043) 2025-07-17 19:21:24 +08:00
Matt Arsenault
f04650bb79
LoongArch: Add test for llvm.exp10 intrinsic (#148606) 2025-07-17 19:08:22 +09:00
ZhaoQi
efa5063ba7
[LoongArch] Optimize inserting element to high part of 256bits vector (#146816) 2025-07-17 17:52:12 +08:00
ZhaoQi
d218011159
[LoongArch] Optimize inserting extracted elements (#146018) 2025-07-17 15:44:49 +08:00
Matt Arsenault
3d50e1f3e8
RuntimeLibcalls: Add some tests for OpenBSD stack protectors (#147888)
7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for
STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests.

Add a basic test copied from RISCV into all the backends on
the OpenBSD page of supported architectures before I potentially
break in in RuntimeLibcalls refactoring.
2025-07-15 15:50:54 +09:00
hev
eb0d61af6e
[LoongArch] Optimize 128-to-256-bit vector insertion and 256-to-128-bit subvector extraction (#146300)
This patch replaces stack-based accesses with register moves when
converting between 128-bit and 256-bit vectors. A 128-bit subvector
extract from, or insert to, the lower half of a 256-bit vector is now
treated as a subregister copy that needs no instruction.

Fixes #147769
2025-07-11 14:32:14 +08:00
hev
34b55e1807
[LoongArch] Precommit tests for 128-to-256-bit vector insertion and 256-to-128-bit subvector extraction (NFC) (#146299) 2025-07-11 11:15:17 +08:00
Matt Arsenault
3614d49499
LoongArch: Add test for sincos intrinsic (#147471) 2025-07-09 02:01:54 +09:00
Qi Zhao
9372f4050a [LoongArch] Pre-commit for optimizing bitcast extracted fp elements. NFC 2025-07-05 14:12:39 +08:00
Qi Zhao
ec752c6766 [LoongArch] Pre-commit tests for optimizing insert bitcast fp element 2025-07-04 19:11:33 +08:00
Guy David
76274eb2b3
[PHIElimination] Revert #131837 #146320 #146337 (#146850)
Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337
2025-07-03 07:48:08 -04:00
woruyu
bbcebec3af
[DAG] Refactor X86 combineVSelectWithAllOnesOrZeros fold into a generic DAG Combine (#145298)
This PR resolves https://github.com/llvm/llvm-project/issues/144513

The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X

1-4 have been migrated to DAGCombine. 5 still in x86 code.

The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.

For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-02 15:07:48 +01:00
Qi Zhao
82c0a53763 [LoongArch] Pre-commit for optimizing insert extracted pair elements 2025-07-02 17:38:08 +08:00
Qi Zhao
66cc167dfa [LoongArch] Add tests for inserting extracted integer elements. NFC 2025-07-01 10:21:33 +08:00
Guy David
f5c62ee0fa
[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
2025-06-29 21:28:42 +03:00
Qi Zhao
569fcac458 [LoongArch] Pre-commit tests for optimizing insert extracted fp elements 2025-06-27 11:19:06 +08:00
ZhaoQi
30e519e1ad
[LoongArch] Fix xvshuf instructions lowering (#145868)
Fix https://github.com/llvm/llvm-project/issues/137000.
2025-06-27 10:29:32 +08:00
Qi Zhao
a19ddff980 [LoongArch] Pre-commit test for fixing xvshuf instructions. NFC
For this test, the `xvshuf.d` instruction should not be generated.

This will be fixed later.
2025-06-26 18:48:30 +08:00
hev
4bb5e48fb9
[LoongArch] Add codegen support for ILP32D calling convention (#141539)
This patch adds codegen support for the calling convention defined by
the ILP32D ABI, which passes `f64` values using a soft-float mechanism.
Similar to RISC-V, it introduces pseudo-instructions to construct an
`f64` value from a pair of `i32`s, and to split an `f64` into two `i32`
values.
2025-06-25 21:00:29 +08:00
Xu Zhang
7c25db3fbf
[DAG] Fold (and X, (add (not Y), Z)) -> (and X, (not (sub Y, Z))). (#141476)
Fixes #140639

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-06-16 15:55:26 +01:00
hev
fe28ea37b6
[LoongArch] Add demanded bits support for [X]VMSKLTZ (#143528)
This patch adds a DAG combine hook for the [X]VMSKLTZ nodes to simplify
their input when possible. It also implements target-specific logic in
SimplifyDemandedBitsForTargetNode to optimize away unnecessary
computations when only a subset of the sign bits in the vector results
is actually used.
2025-06-12 18:39:16 +08:00
hev
acc43db9aa
[LoongArch] Convert vector mask to vXi1 using [X]VMSKLTZ (#142978)
This patch adds a DAG combine optimization that transforms `BITCAST`
nodes converting vector masks into `vXi1` types via the `[X]VMSKLTZ`
instructions.
2025-06-10 20:08:28 +08:00
Ami-zhang
0ed5d9aff6
[LoongArch][BF16] Add support for the __bf16 type (#142548)
The LoongArch psABI recently added __bf16 type support. Now we can
enable this new type in clang.

Currently, bf16 operations are automatically supported by promoting to
float. This patch adds bf16 support by ensuring that load extension /
truncate store operations are properly expanded.

And this commit implements support for bf16 truncate/extend on hard FP
targets. The extend operation is implemented by a shift just as in the
standard legalization. This requires custom lowering of the truncate
libcall on hard float ABIs (the normal libcall code path is used on soft
ABIs).
2025-06-09 11:15:41 +08:00
tangaac
90beda2aba
[LoongArch] Lower vector_shuffle as lane permute and shuffle for lasx if possible. (#141196) 2025-06-09 09:23:53 +08:00
Weining Lu
90a52f4942 [LoongArch] Pass OptLevel to LoongArchDAGToDAGISel correctly
Like many other targets did. And see RISCV for similar fix.

Fix https://github.com/llvm/llvm-project/issues/143239
2025-06-07 15:33:58 +08:00
Weining Lu
fcc82cfa93 [LoongArch] Precommit test case to show bug in LoongArchISelDagToDag
The optimization level should not be restored into O2.
2025-06-07 15:10:26 +08:00
hev
182c1c268f
[LoongArch][NFC] Pre-commit for converting vector mask to vXi1 using [X]VMSKLTZ (#142977) 2025-06-06 16:26:17 +08:00
hev
470f456567
[LoongArch] Add codegen support for atomic-ops on LA32 (#141557)
This patch adds codegen support for atomic operations `cmpxchg`, `max`,
`min`, `umax` and `umin` on the LA32 target.
2025-06-06 16:00:59 +08:00
hev
2718a47f49
[LoongArch] Lower vector select mask generation to [X]VMSK{LT,GE,NE}Z if possible (#142109)
This patch adds a DAG combine rule for BITCAST nodes converting from
vector `i1` masks generated by `setcc` into integer vector types. It
recognizes common select mask patterns and lowers them into efficient
LoongArch LSX/LASX mask instructions such as:

- [X]VMSKLTZ.{B,H,W,D}
- [X]VMSKGEZ.B
- [X]VMSKNEZ.B

When the vector comparison matches specific patterns (e.g., x < 0, x >=
0, x != 0, etc.), the transformation is performed pre-legalization. This
avoids scalarization and unnecessary operations, improving both
performance and code size.
2025-06-05 22:17:38 +08:00
hev
d979423fb0
[LoongArch][NFC] Pre-commit for lowering vector mask generation to [X]VMSK{LT,GE,NE}Z (#142108) 2025-06-05 20:26:09 +08:00