382 Commits

Author SHA1 Message Date
WÁNG Xuěruì
f246b5f547
[LoongArch] Support bswap for LSX/LASX VTs (#114171)
On top of #114170
2024-11-01 00:38:13 +08:00
hev
f7a96dc664
[LoongArch] Ensure pcaddu18i and jirl adjacency in tail calls for correct relocation (#113932)
Prior to this patch, both `pcaddu18i` and `jirl` were marked as
scheduling boundaries to prevent instruction reordering that would
disrupt their adjacency. However, in certain cases, epilogues were still
being inserted between these two instructions, breaking the required
proximity. This patch ensures that `pcaddu18i` and `jirl` remain
adjacent even in the presence of epilogues, maintaining correct
relocation behavior for tail calls on LoongArch.
2024-11-01 00:08:15 +08:00
WÁNG Xuěruì
5581e43a2b
[LoongArch][NFC] Pre-commit tests for LSX/LASX bswap codegen (#114170) 2024-10-31 21:10:26 +08:00
WANG Rui
862074fa57 [LoongArch][NFC] Pre-commit tests for the adjacency of expanded pseudo-insns 2024-10-31 16:59:41 +08:00
Ami-zhang
1897bf61f0
[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421)
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
2024-10-31 15:58:15 +08:00
hev
b225b15a3d
[LoongArch] Merge base and offset for large offsets (#113277)
This PR merges large offsets into the base address loading.
2024-10-23 19:43:23 +08:00
tangaac
5b9c76b6e7
[LoongArch] Support LoongArch-specific amswap[_db].{b/h} and amadd[_db].{b/h} instructions (#113255)
Two options for clang: -mlam-bh & -mno-lam-bh.
Enable or disable amswap[__db].{b/h} and amadd[__db].{b/h} instructions.
The default is -mno-lam-bh.
Only works on LoongArch64.
2024-10-23 16:03:15 +08:00
WANG Rui
4614b80c49 [LoongArch] Pre-commit tests for merge base with large offset. NFC 2024-10-22 15:44:40 +08:00
tangaac
ba5676cf91
[LoongArch] Minor refinement to monotonic atomic semantics. (#112681)
Don't use "_db" version AM instructions for LoongArch atomic memory
operations with monotonic semantics.
2024-10-21 15:58:35 +08:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
tangaac
e9eec14bb3
[LoongArch] [CodeGen] Add options for Clang to generate LoongArch-specific frecipe & frsqrte instructions (#109917)
Two options: `-mfrecipe` & `-mno-frecipe`.
Enable or Disable frecipe.{s/d} and frsqrte.{s/d} instructions. 
The default is `-mno-frecipe`.
2024-10-18 09:06:29 +08:00
wanglei
4c2c177567
[LoongArch] Add options for annotate tablejump
This aligns with GCC. LoongArch kernel developers requested that this
option generate some corresponding relations in a section, including the
addresses of the jump instruction(jr) and the `MachineJumpTableEntry`.

Reviewed By: heiher

Pull Request: https://github.com/llvm/llvm-project/pull/102411
2024-10-16 11:58:00 +08:00
WANG Rui
8e3cde04cb [LoongArch][test] Add float-point atomic load/store tests. NFC 2024-09-25 15:39:22 +08:00
Robert Dazi
8837898b8d
[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877)
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.

This PR has two parts:

* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.

Remaining Tasks:

- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?

@topperc

---------

Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
2024-09-15 15:48:36 +04:00
YANG Xudong
13280d99ae
[loongarch][DAG][FREEZE] Fix crash when FREEZE a half(f16) type on loongarch (#107791)
For zig with LLVM 19.1.0rc4, we are seeing the following error when
bootstrapping a `loongarch64-linux-musl` target.


https://github.com/ziglang/zig-bootstrap/issues/164#issuecomment-2332357069

It seems that this issue is caused by `PromoteFloatResult` is not
handling FREEZE OP on loongarch.

Here is the reproduction of the error: https://godbolt.org/z/PPfvWjjG5

~~This patch adds the FREEZE OP handling with `PromoteFloatRes_UnaryOp`
and adds a test case.~~

This patch changes loongarch's way of floating point promotion to soft
promotion to avoid this problem.

See: loongarch's handling of `half`:
- https://github.com/llvm/llvm-project/issues/93894
- https://github.com/llvm/llvm-project/pull/94456

Also see: other float promotion FREEZE handling
-
0019c2f194
2024-09-13 08:49:54 +08:00
Lu Weining
ffcebcdb96
[LoongArch] Implement Statepoint lowering (#108212)
The functionality has been validated in OpenHarmony's arkcompiler.
2024-09-12 18:05:13 +08:00
hev
0f47e3aebd
[LoongArch] Eliminate the redundant sign extension of division (#107971)
If all incoming values of `div.d` are sign-extended and all users only
use the lower 32 bits, then convert them to W versions.

Fixes: #107946
2024-09-10 16:52:21 +08:00
wanglei
1ca411ca45
[LoongArch] Codegen for concat_vectors with LASX
Fixes: #107355

Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/107523
2024-09-10 09:28:15 +08:00
Yingwei Zheng
a111f9119a
[LoongArch][ISel] Check the number of sign bits in PatGprGpr_32 (#107432)
After https://github.com/llvm/llvm-project/pull/92205, LoongArch ISel
selects `div.w` for `trunc i64 (sdiv i64 3202030857, (sext i32 X to
i64)) to i32`. It is incorrect since `3202030857` is not a signed 32-bit
constant. It will produce wrong result when `X == 2`:
https://alive2.llvm.org/ce/z/pzfGZZ

This patch adds additional `sexti32` checks to operands of
`PatGprGpr_32`.
Alive2 proof: https://alive2.llvm.org/ce/z/AkH5Mp

Fix #107414.
2024-09-10 09:19:39 +08:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
wanglei
df93327c1a
[LoongArch] Legalize ISD::CTPOP for GRLenVT type with LSX
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106941
2024-09-06 15:46:43 +08:00
wanglei
4b2c950de5
[test][LoongArch] Pre-commit test for optimize CTPOP. NFC
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106940
2024-09-06 15:45:23 +08:00
wanglei
eaf87d3275 [LoongArch] Optimize for immediate value materialization using BSTRINS_D instruction
Reviewed By: heiher, SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106332
2024-08-30 16:38:42 +08:00
wanglei
5b77e254e8
[LoongArch] Pre-commit test for immediate value materialization using BSTRINS_D
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/106331
2024-08-30 16:37:20 +08:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Weining Lu
63267ca901 [LoongArch] Fix the assertion for atomic store with 'ptr' type 2024-08-19 17:17:36 +08:00
hev
985d64b03a
[LoongArch] Merge base and offset for LSX/LASX memory accesses (#104452) 2024-08-19 15:23:05 +08:00
WANG Rui
82cf6558e5 [LoongArch] Pre-commit tests for validating the merge base offset in vecotrs. NFC 2024-08-15 21:06:27 +08:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
Peter Rong
74e4694b8c
[LTO] enable ObjCARCContractPass only on optimized build (#101114)
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:

1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.

Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`

Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-08-09 13:04:25 -07:00
hev
dbae30df24
[LoongArch] Load floating-point immediate using VLDI (#101923)
This commit uses the VLDI instruction to load some common floating-point
constants when the LSX feature is enabled.
2024-08-09 14:08:32 +08:00
hev
b2e69f52bb
[LoongArch] Add machine function pass to merge base + offset (#101139)
This commit references RISC-V to add a machine function pass to merge
the base address and offset.
2024-08-08 23:05:38 +08:00
Alexis Engelke
fa92d51f9e
[VP] Merge ExpandVP pass into PreISelIntrinsicLowering (#101652)
Similar to #97727; avoid an extra pass over the entire IR by performing
the lowering as part of the pre-isel-intrinsic-lowering pass.
2024-08-06 09:27:59 +02:00
WANG Rui
5f7e921fe3 [LoongArch] Pre-commit test for load floating-point immediate using VLDI. NFC 2024-08-05 11:47:27 +08:00
hev
8b26c02caa
[LoongArch] Align stack objects passed to memory intrinsics (#101309)
Memcpy, and other memory intrinsics, typically try to use wider
load/store if the source and destination addresses are aligned. In
CodeGenPrepare, look for calls to memory intrinsics and, if the object
is on the stack, align it to 4-byte (32-bit) or 8-byte (64-bit)
boundaries if it is large enough that we expect memcpy to use wider
load/store instructions to copy it.

Fixes #101295
2024-08-02 11:28:03 +08:00
Alexis Engelke
b5fc083dc3
[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727)
Currently, the LowerConstantIntrinsics pass does an RPO traversal of
every function... only to find that many functions don't have constant
intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is
already a pre-isel intrinsic lowering pass, which iterates over
intrinsic declarations and lowers all users. Call
lowerConstantIntrinsics from this pass to avoid the extra iteration over
the entire IR and the RPO traversal.
2024-08-01 17:44:32 +02:00
WANG Rui
f51a479520 [LoongArch] Pre-commit test for aligning stack objects passed to memory intrinsics. NFC 2024-08-01 17:17:28 +08:00
Craig Topper
307d1249ea
[LegalizeTypes][RISCV][LoongArch] Optimize promotion of ucmp. (#101366)
ucmp can be promoted with either sext or zext. RISC-V and LoongArch
prefer sext for promoting i32 to i64 unless the inputs are known to be
zero extended already.

This patch uses the existing SExtOrZExtPromotedOperands function that is
used by SETCC promotion to intelligently handle this.
2024-07-31 17:18:27 -07:00
WANG Rui
84ad292f34 [LoongArch] Pre-commit tests for merge base offset. NFC 2024-07-30 14:45:25 +08:00
hev
0e6f64cd5e
[LoongArch] Reimplement to prevent Pseudo{CALL, LA*}_LARGE instruction reordering (#100099)
The Pseudo{CALL, LA*}_LARGE instruction patterns specified in psABI
v2.30 cannot be reordered. This patch sets scheduling boundaries for
these instructions to prevent reordering. The Pseudo{CALL, LA*}_LARGE
instruction is moved back to Pre-RA expansion, which will help with
subsequent address calculation optimizations.
2024-07-30 14:22:53 +08:00
hev
3e2631c9c6
[LoongArch] Optimize codegen for ISD::ROTL (#100344)
The LoongArch rotr.{w,d} instruction ignores the high bits of the shift
operand, allowing it to generate more efficient code using the constant
zero register.
2024-07-30 14:22:24 +08:00
hev
e386aacb74
[LoongArch] Fix codegen for ISD::ROTR (#100292)
This patch fixes the code generation for IR:

sext i32 (trunc i64 (rotr i64 %x, i64 %y) to i32) to i64
2024-07-24 12:08:43 +08:00
WANG Rui
9d1d0cc020 [LoongArch][test] Revert "Pre-commit for fix codegen for ISD::ROTR". NFC
This reverts commit bc829b501d0ffa93019d29b0294e998d3dbb3d7a.
2024-07-24 11:00:50 +08:00
WANG Rui
bc829b501d [LoongArch][test] Pre-commit for fix codegen for ISD::ROTR. NFC 2024-07-24 10:46:58 +08:00
Ami-zhang
fcec298087
[LoongArch] Support la664 (#100068)
A new ProcessorModel called `la664` is defined in LoongArch.td to
support `-march/-mtune=la664`.
2024-07-23 15:14:20 +08:00
Zhaoxin Yang
464ea880cf
[LoongArch][CodeGen] Implement 128-bit and 256-bit vector shuffle. (#100054)
[LoongArch][CodeGen] Implement 128-bit and 256-bit vector shuffle
operations.

In LoongArch, shuffle operations can be divided into two types:
- Single-vector shuffle: Shuffle using only one vector, with the other
vector being `undef` or not selected by mask. This can be expanded to
instructions such as `vreplvei` and `vshuf4i`.
- Two-vector shuffle: Shuflle using two vectors. This can be expanded to
instructions like `vilv[l/h]`, `vpack[ev/od]`, `vpick[ev/od]` and the
basic `vshuf`.

In the future, more optimizations may be added, such as handling 1-bit
vectors and processing single element patterns, etc.
2024-07-23 12:06:59 +08:00
WANG Rui
87c35d7827 [LoongArch][test] Add --relocation-model=pic option to psabi-restricted-scheduling. NFC
Add --relocation-model=pic option for generating %gd_pc_hi20 and %ld_pc_hi20.
2024-07-23 11:32:01 +08:00
hev
4c73b1a986
[LoongArch] Recommit "Remove spurious mask operations from andn->icmp on 16 and 8 bit values" (#99798)
recommit of #99272
2024-07-22 15:10:21 +08:00
WANG Rui
aefe411dae [LoongArch] Add a test for spurious mask removal. NFC
Link: https://github.com/llvm/llvm-project/pull/99272#issuecomment-2241348794
2024-07-21 13:42:26 +08:00
hev
1d5d18924d
Revert "[LoongArch] Remove spurious mask operations from andn->icmp on 16 and 8 bit values" (#99792)
Reverts llvm/llvm-project#99272
2024-07-21 10:17:20 +08:00