llvm-project

Author	SHA1	Message	Date
tangaac	bf6d52a2dd	[LoongArch] Pre-commit for vecreduce_add. (#154302 )	2025-08-20 09:12:47 +08:00
tangaac	ccbcebcfd3	[LoongArch] Fix implicit PesudoXVINSGR2VR error (#152432 ) According to the instructions manual, when `vr0` is changed, high 128 bit of `xr0` is undefined. Use `vinsgr2vr.b/h` to insert an `i8/i16` to low 128bit of a 256 vector may cause undefined behavior when high 128bit is used in later instructions.	2025-08-19 17:22:00 +08:00
ZhaoQi	be3fd6ae25	[LoongArch] Use section-relaxable check instead of relax feature from STI (#153792 ) In some cases, such as using `lto` or `llc`, relax feature is not available from this `SubtargetInfo` (`LoongArchAsmBackend` is instantiated too early), causing loss of relocations. This commit modifiy the condition to check whether the section which contains the two symbols is relaxable. If not relaxable, no need to record relocations.	2025-08-19 09:48:51 +08:00
ZhaoQi	3acb7093c2	[LoongArch][NFC] Add tests for fixing missed addsub relocs when enabling relax (#154108 )	2025-08-19 09:15:23 +08:00
ZhaoQi	8f671a675f	[LoongArch] Always emit symbol-based relocations regardless of relaxation (#153943 ) This commit changes all relocations to be relocated with symbols. Without this commit, errors may occur in some cases, such as when using `llc/lto+relax`, or combining relaxed and norelaxed object files using `ld -r`. Some tests updated.	2025-08-18 20:15:49 +08:00
ZhaoQi	6957e44d8e	[LoongArch][MC] Refine conditions for emitting ALIGN relocations (#153365 ) According to the suggestions in https://github.com/llvm/llvm-project/pull/150816, this commit refine the conditions for emitting R_LARCH_ALIGN relocations. Some existing tests are updated to avoid being affected by this optimization. New tests are added to verify: removal of redundant ALIGN relocations, ALIGN emitted after the first linker-relaxable instruction, and conservatively emitted ALIGN in lower-numbered subsections.	2025-08-18 14:54:27 +08:00
tangaac	9315d701eb	[LoongArch] Optimize inserting extracted element for v4i64/v8i32 (#152629 )	2025-08-14 17:06:50 +08:00
Trevor Gross	00c4be3c9e	[Test] Add and update tests for `lrint`/`llrint` (NFC) (#152662 ) Many backends are missing either all tests for lrint, or specifically those for f16, which currently crashes for `softPromoteHalf` targets. For a number of popular backends, do the following: * Ensure f16, f32, f64, and f128 are all covered * Ensure both a 32- and 64-bit target are tested, if relevant * Add `nounwind` to clean up CFI output * Add a test covering the above if one did not exist * Always specify the integer type in intrinsic calls There are quite a few FIXMEs here, especially for `f16`, but much of this will be resolved in the near future.	2025-08-12 09:56:51 +09:00
Qi Zhao	2f8e4f8b26	[LoongArch] Pre-commit tests for shuffle visiting same lane. NFC PR: https://github.com/llvm/llvm-project/pull/151633.	2025-08-09 18:29:26 +08:00
tangaac	b05e26be8a	[LoongArch] Optimize extracting f32/f64 from 256-bit vector by using XVPICKVE. (#151914 )	2025-08-06 09:11:34 +08:00
ZhaoQi	ece7a72aa2	[LoongArch] Optimize insertelement containing variable index using compare+select (#151131 )	2025-07-30 18:06:41 +08:00
ZhaoQi	80e0d41677	[LoongArch] Custom legalizing build_vector with same constant elements (#150584 )	2025-07-28 09:50:35 +08:00
ZhaoQi	f2a4cc1dd0	[LoongArch] Avoid expanding build_vector containing insertion of undef elements (#150377 )	2025-07-26 14:24:39 +08:00
Qi Zhao	e3b5daf2db	[LoongArch] Pre-commit tests for build_vector with same constant elements. NFC	2025-07-25 15:29:41 +08:00
Qi Zhao	afbf86e719	[LoongArch] Pre-commit tests for build_vector with undef elements inserting	2025-07-24 11:52:47 +08:00
ZhaoQi	ddf34b4c97	[LoongArch] Optimize general fp build_vector lowering (#149486 )	2025-07-22 16:16:27 +08:00
ZhaoQi	cae7650558	[LoongArch] Optimize inserting fp element to vector (#149302 ) Co-authored-by: tangaac <tangyan01@loongson.cn>	2025-07-22 13:38:46 +08:00
mintsuki	9ed8816dc6	LoongArch: Improve detection of valid TripleABI (#147952 ) If the environment is considered to be the triple component as a whole, so, including the object format, if any, and if that is the intended behaviour, then the loongarch64 function `computeTargetABI()` should be changed to not rely on `hasEnvironment()`, but, rather, to check if there is a non-unknown environment set. Without this change, using a (ideally valid) target of loongarch64-unknown-none-elf, with a manually specified ABI of lp64s, will result in a completely superfluous warning: ``` warning: triple-implied ABI conflicts with provided target-abi 'lp64s', using target-abi ```	2025-07-22 12:13:37 +08:00
hev	8a307ae619	[LoongArch] Fix failure to widen operand for `[X]VMSK{LT,GE,NE}Z` (#149442 ) Reported-by: tangyan <tangyan01@loongson.cn>	2025-07-21 16:36:49 +08:00
tangaac	64a0478e08	[LoongArch] Strengthen stack size estimation for LSX/LASX extension (#146455 ) This patch adds an emergency spill slot when ran out of registers. PR #139201 introduces `vstelm` instructions with only 8-bit imm offset, it causes no spill slot to store the spill registers.	2025-07-18 16:12:11 +08:00
ZhaoQi	e74082703e	[LoongArch] Optimize inserting bitcasted integer element or bitcasting extracted fp element (#147043 )	2025-07-17 19:21:24 +08:00
Matt Arsenault	f04650bb79	LoongArch: Add test for llvm.exp10 intrinsic (#148606 )	2025-07-17 19:08:22 +09:00
ZhaoQi	efa5063ba7	[LoongArch] Optimize inserting element to high part of 256bits vector (#146816 )	2025-07-17 17:52:12 +08:00
ZhaoQi	d218011159	[LoongArch] Optimize inserting extracted elements (#146018 )	2025-07-17 15:44:49 +08:00
Matt Arsenault	3d50e1f3e8	RuntimeLibcalls: Add some tests for OpenBSD stack protectors (#147888 ) 7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests. Add a basic test copied from RISCV into all the backends on the OpenBSD page of supported architectures before I potentially break in in RuntimeLibcalls refactoring.	2025-07-15 15:50:54 +09:00
hev	eb0d61af6e	[LoongArch] Optimize 128-to-256-bit vector insertion and 256-to-128-bit subvector extraction (#146300 ) This patch replaces stack-based accesses with register moves when converting between 128-bit and 256-bit vectors. A 128-bit subvector extract from, or insert to, the lower half of a 256-bit vector is now treated as a subregister copy that needs no instruction. Fixes #147769	2025-07-11 14:32:14 +08:00
hev	34b55e1807	[LoongArch] Precommit tests for 128-to-256-bit vector insertion and 256-to-128-bit subvector extraction (NFC) (#146299 )	2025-07-11 11:15:17 +08:00
Matt Arsenault	3614d49499	LoongArch: Add test for sincos intrinsic (#147471 )	2025-07-09 02:01:54 +09:00
Qi Zhao	9372f4050a	[LoongArch] Pre-commit for optimizing bitcast extracted fp elements. NFC	2025-07-05 14:12:39 +08:00
Qi Zhao	ec752c6766	[LoongArch] Pre-commit tests for optimizing insert bitcast fp element	2025-07-04 19:11:33 +08:00
Guy David	76274eb2b3	[PHIElimination] Revert #131837 #146320 #146337 (#146850 ) Reverting because mis-compiles: - https://github.com/llvm/llvm-project/pull/131837 - https://github.com/llvm/llvm-project/pull/146320 - https://github.com/llvm/llvm-project/pull/146337	2025-07-03 07:48:08 -04:00
woruyu	bbcebec3af	[DAG] Refactor X86 combineVSelectWithAllOnesOrZeros fold into a generic DAG Combine (#145298 ) This PR resolves https://github.com/llvm/llvm-project/issues/144513 The modification include five pattern : 1.vselect Cond, 0, 0 → 0 2.vselect Cond, -1, 0 → bitcast Cond 3.vselect Cond, -1, x → or Cond, x 4.vselect Cond, x, 0 → and Cond, x 5.vselect Cond, 000..., X -> andn Cond, X 1-4 have been migrated to DAGCombine. 5 still in x86 code. The reason is that you cannot use the andn instruction directly in DAGCombine, you can only use and+xor, which will introduce optimization order issues. For example, in the x86 backend, select Cond, 0, x → (~Cond) & x, the backend will first check whether the cond node of (~Cond) is a setcc node. If so, it will modify the comparison operator of the condition.So the x86 backend cannot complete the optimization of andn.In short, I think it is a better choice to keep the pattern of vselect Cond, 000..., X instead of and+xor in combineDAG. For commit, the first is code changes and x86 test(note 1), the second is tests in other backend(node 2). --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-07-02 15:07:48 +01:00
Qi Zhao	82c0a53763	[LoongArch] Pre-commit for optimizing insert extracted pair elements	2025-07-02 17:38:08 +08:00
Qi Zhao	66cc167dfa	[LoongArch] Add tests for inserting extracted integer elements. NFC	2025-07-01 10:21:33 +08:00
Guy David	f5c62ee0fa	[PHIElimination] Reuse existing COPY in predecessor basic block (#131837 ) The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.	2025-06-29 21:28:42 +03:00
Qi Zhao	569fcac458	[LoongArch] Pre-commit tests for optimizing insert extracted fp elements	2025-06-27 11:19:06 +08:00
ZhaoQi	30e519e1ad	[LoongArch] Fix xvshuf instructions lowering (#145868 ) Fix https://github.com/llvm/llvm-project/issues/137000.	2025-06-27 10:29:32 +08:00
Qi Zhao	a19ddff980	[LoongArch] Pre-commit test for fixing xvshuf instructions. NFC For this test, the `xvshuf.d` instruction should not be generated. This will be fixed later.	2025-06-26 18:48:30 +08:00
hev	4bb5e48fb9	[LoongArch] Add codegen support for ILP32D calling convention (#141539 ) This patch adds codegen support for the calling convention defined by the ILP32D ABI, which passes `f64` values using a soft-float mechanism. Similar to RISC-V, it introduces pseudo-instructions to construct an `f64` value from a pair of `i32`s, and to split an `f64` into two `i32` values.	2025-06-25 21:00:29 +08:00
Xu Zhang	7c25db3fbf	[DAG] Fold (and X, (add (not Y), Z)) -> (and X, (not (sub Y, Z))). (#141476 ) Fixes #140639 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-06-16 15:55:26 +01:00
hev	fe28ea37b6	[LoongArch] Add demanded bits support for [X]VMSKLTZ (#143528 ) This patch adds a DAG combine hook for the [X]VMSKLTZ nodes to simplify their input when possible. It also implements target-specific logic in SimplifyDemandedBitsForTargetNode to optimize away unnecessary computations when only a subset of the sign bits in the vector results is actually used.	2025-06-12 18:39:16 +08:00
hev	acc43db9aa	[LoongArch] Convert vector mask to `vXi1` using `[X]VMSKLTZ` (#142978 ) This patch adds a DAG combine optimization that transforms `BITCAST` nodes converting vector masks into `vXi1` types via the `[X]VMSKLTZ` instructions.	2025-06-10 20:08:28 +08:00
Ami-zhang	0ed5d9aff6	[LoongArch][BF16] Add support for the __bf16 type (#142548 ) The LoongArch psABI recently added __bf16 type support. Now we can enable this new type in clang. Currently, bf16 operations are automatically supported by promoting to float. This patch adds bf16 support by ensuring that load extension / truncate store operations are properly expanded. And this commit implements support for bf16 truncate/extend on hard FP targets. The extend operation is implemented by a shift just as in the standard legalization. This requires custom lowering of the truncate libcall on hard float ABIs (the normal libcall code path is used on soft ABIs).	2025-06-09 11:15:41 +08:00
tangaac	90beda2aba	[LoongArch] Lower vector_shuffle as lane permute and shuffle for lasx if possible. (#141196 )	2025-06-09 09:23:53 +08:00
Weining Lu	90a52f4942	[LoongArch] Pass OptLevel to LoongArchDAGToDAGISel correctly Like many other targets did. And see RISCV for similar fix. Fix https://github.com/llvm/llvm-project/issues/143239	2025-06-07 15:33:58 +08:00
Weining Lu	fcc82cfa93	[LoongArch] Precommit test case to show bug in LoongArchISelDagToDag The optimization level should not be restored into O2.	2025-06-07 15:10:26 +08:00
hev	182c1c268f	[LoongArch][NFC] Pre-commit for converting vector mask to `vXi1` using `[X]VMSKLTZ` (#142977 )	2025-06-06 16:26:17 +08:00
hev	470f456567	[LoongArch] Add codegen support for atomic-ops on LA32 (#141557 ) This patch adds codegen support for atomic operations `cmpxchg`, `max`, `min`, `umax` and `umin` on the LA32 target.	2025-06-06 16:00:59 +08:00
hev	2718a47f49	[LoongArch] Lower vector select mask generation to `[X]VMSK{LT,GE,NE}Z` if possible (#142109 ) This patch adds a DAG combine rule for BITCAST nodes converting from vector `i1` masks generated by `setcc` into integer vector types. It recognizes common select mask patterns and lowers them into efficient LoongArch LSX/LASX mask instructions such as: - [X]VMSKLTZ.{B,H,W,D} - [X]VMSKGEZ.B - [X]VMSKNEZ.B When the vector comparison matches specific patterns (e.g., x < 0, x >= 0, x != 0, etc.), the transformation is performed pre-legalization. This avoids scalarization and unnecessary operations, improving both performance and code size.	2025-06-05 22:17:38 +08:00
hev	d979423fb0	[LoongArch][NFC] Pre-commit for lowering vector mask generation to `[X]VMSK{LT,GE,NE}Z` (#142108 )	2025-06-05 20:26:09 +08:00

1 2 3 4 5 ...

515 Commits