Prior to this patch, both `pcaddu18i` and `jirl` were marked as
scheduling boundaries to prevent instruction reordering that would
disrupt their adjacency. However, in certain cases, epilogues were still
being inserted between these two instructions, breaking the required
proximity. This patch ensures that `pcaddu18i` and `jirl` remain
adjacent even in the presence of epilogues, maintaining correct
relocation behavior for tail calls on LoongArch.
This commit makes the `generic` target to support FP and LSX, as
discussed in #110211. Thereby, it allows 128-bit vector to be enabled by
default in the loongarch64 backend.
Two options for clang: -mlam-bh & -mno-lam-bh.
Enable or disable amswap[__db].{b/h} and amadd[__db].{b/h} instructions.
The default is -mno-lam-bh.
Only works on LoongArch64.
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
This aligns with GCC. LoongArch kernel developers requested that this
option generate some corresponding relations in a section, including the
addresses of the jump instruction(jr) and the `MachineJumpTableEntry`.
Reviewed By: heiher
Pull Request: https://github.com/llvm/llvm-project/pull/102411
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.
This PR has two parts:
* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.
Remaining Tasks:
- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?
@topperc
---------
Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:
1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.
Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`
Signed-off-by: Peter Rong <PeterRong@meta.com>
Memcpy, and other memory intrinsics, typically try to use wider
load/store if the source and destination addresses are aligned. In
CodeGenPrepare, look for calls to memory intrinsics and, if the object
is on the stack, align it to 4-byte (32-bit) or 8-byte (64-bit)
boundaries if it is large enough that we expect memcpy to use wider
load/store instructions to copy it.
Fixes#101295
Currently, the LowerConstantIntrinsics pass does an RPO traversal of
every function... only to find that many functions don't have constant
intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is
already a pre-isel intrinsic lowering pass, which iterates over
intrinsic declarations and lowers all users. Call
lowerConstantIntrinsics from this pass to avoid the extra iteration over
the entire IR and the RPO traversal.
ucmp can be promoted with either sext or zext. RISC-V and LoongArch
prefer sext for promoting i32 to i64 unless the inputs are known to be
zero extended already.
This patch uses the existing SExtOrZExtPromotedOperands function that is
used by SETCC promotion to intelligently handle this.
The Pseudo{CALL, LA*}_LARGE instruction patterns specified in psABI
v2.30 cannot be reordered. This patch sets scheduling boundaries for
these instructions to prevent reordering. The Pseudo{CALL, LA*}_LARGE
instruction is moved back to Pre-RA expansion, which will help with
subsequent address calculation optimizations.
The LoongArch rotr.{w,d} instruction ignores the high bits of the shift
operand, allowing it to generate more efficient code using the constant
zero register.
[LoongArch][CodeGen] Implement 128-bit and 256-bit vector shuffle
operations.
In LoongArch, shuffle operations can be divided into two types:
- Single-vector shuffle: Shuffle using only one vector, with the other
vector being `undef` or not selected by mask. This can be expanded to
instructions such as `vreplvei` and `vshuf4i`.
- Two-vector shuffle: Shuflle using two vectors. This can be expanded to
instructions like `vilv[l/h]`, `vpack[ev/od]`, `vpick[ev/od]` and the
basic `vshuf`.
In the future, more optimizations may be added, such as handling 1-bit
vectors and processing single element patterns, etc.