According to the instructions manual, when `vr0` is changed, high 128
bit of `xr0` is undefined.
Use `vinsgr2vr.b/h` to insert an `i8/i16` to low 128bit of a 256 vector
may cause undefined behavior when high 128bit is used in later
instructions.
In some cases, such as using `lto` or `llc`, relax feature is not
available from this `SubtargetInfo` (`LoongArchAsmBackend` is
instantiated too early), causing loss of relocations.
This commit modifiy the condition to check whether the section which
contains the two symbols is relaxable. If not relaxable, no need to
record relocations.
This commit changes all relocations to be relocated with symbols.
Without this commit, errors may occur in some cases, such as when using
`llc/lto+relax`, or combining relaxed and norelaxed object files using
`ld -r`.
Some tests updated.
According to the suggestions in
https://github.com/llvm/llvm-project/pull/150816, this commit refine the
conditions for emitting R_LARCH_ALIGN relocations.
Some existing tests are updated to avoid being affected by this
optimization. New tests are added to verify: removal of redundant ALIGN
relocations, ALIGN emitted after the first linker-relaxable instruction,
and conservatively emitted ALIGN in lower-numbered subsections.
Many backends are missing either all tests for lrint, or specifically
those for f16, which currently crashes for `softPromoteHalf` targets.
For a number of popular backends, do the following:
* Ensure f16, f32, f64, and f128 are all covered
* Ensure both a 32- and 64-bit target are tested, if relevant
* Add `nounwind` to clean up CFI output
* Add a test covering the above if one did not exist
* Always specify the integer type in intrinsic calls
There are quite a few FIXMEs here, especially for `f16`, but much of
this will be resolved in the near future.
If the environment is considered to be the triple component as a whole,
so, including the object format, if any, and if that is the intended
behaviour, then the loongarch64 function `computeTargetABI()` should be
changed to not rely on `hasEnvironment()`, but, rather, to check if
there is a non-unknown environment set.
Without this change, using a (ideally valid) target of
loongarch64-unknown-none-elf, with a manually specified ABI of lp64s,
will result in a completely superfluous warning:
```
warning: triple-implied ABI conflicts with provided target-abi 'lp64s', using target-abi
```
This patch adds an emergency spill slot when ran out of registers.
PR #139201 introduces `vstelm` instructions with only 8-bit imm offset,
it causes no spill slot to store the spill registers.
7dce16f69dc3e26cb74d5ad38b0648a6f47f9640 removed a libcall for
STACKPROTECTOR_CHECK_FAIL from OpenBSD but added no tests.
Add a basic test copied from RISCV into all the backends on
the OpenBSD page of supported architectures before I potentially
break in in RuntimeLibcalls refactoring.
This patch replaces stack-based accesses with register moves when
converting between 128-bit and 256-bit vectors. A 128-bit subvector
extract from, or insert to, the lower half of a 256-bit vector is now
treated as a subregister copy that needs no instruction.
Fixes#147769
This PR resolves https://github.com/llvm/llvm-project/issues/144513
The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X
1-4 have been migrated to DAGCombine. 5 still in x86 code.
The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.
For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.
This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
This patch adds codegen support for the calling convention defined by
the ILP32D ABI, which passes `f64` values using a soft-float mechanism.
Similar to RISC-V, it introduces pseudo-instructions to construct an
`f64` value from a pair of `i32`s, and to split an `f64` into two `i32`
values.
This patch adds a DAG combine hook for the [X]VMSKLTZ nodes to simplify
their input when possible. It also implements target-specific logic in
SimplifyDemandedBitsForTargetNode to optimize away unnecessary
computations when only a subset of the sign bits in the vector results
is actually used.
This patch adds a DAG combine optimization that transforms `BITCAST`
nodes converting vector masks into `vXi1` types via the `[X]VMSKLTZ`
instructions.
The LoongArch psABI recently added __bf16 type support. Now we can
enable this new type in clang.
Currently, bf16 operations are automatically supported by promoting to
float. This patch adds bf16 support by ensuring that load extension /
truncate store operations are properly expanded.
And this commit implements support for bf16 truncate/extend on hard FP
targets. The extend operation is implemented by a shift just as in the
standard legalization. This requires custom lowering of the truncate
libcall on hard float ABIs (the normal libcall code path is used on soft
ABIs).
This patch adds a DAG combine rule for BITCAST nodes converting from
vector `i1` masks generated by `setcc` into integer vector types. It
recognizes common select mask patterns and lowers them into efficient
LoongArch LSX/LASX mask instructions such as:
- [X]VMSKLTZ.{B,H,W,D}
- [X]VMSKGEZ.B
- [X]VMSKNEZ.B
When the vector comparison matches specific patterns (e.g., x < 0, x >=
0, x != 0, etc.), the transformation is performed pre-legalization. This
avoids scalarization and unnecessary operations, improving both
performance and code size.