Before this patch, redundant COPY couldn't be removed for the following
case:
```
$R0 = OP ...
... // Read of %R0
$R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
$R1 = OP ...
... // Replace all uses of %R0 with $R1
```
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.
Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.
Note: I don't have commit access.
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
Some of these check lines are insufficient to determine correctness.
Generate full check lines instead.
To reduce noise, add nounwind and use static relocation model.
Fix issue1: In mips1-4, require a minimum of 2 instructions between a
mflo/mfhi and the next mul/dmult/div/ddiv/divu/ddivu instruction.
Fix issue2: In mips1-4, should not put mflo into the delay slot for the
return.
Fix https://github.com/llvm/llvm-project/issues/81291
Minor improvement on cc39c3b17fb2598e20ca0854f9fe6d69169d85c7.
Use an aligned stack slot to store the shifted value.
Use the native register width as shifting unit, so the load of the
shift result is aligned.
If the shift amount is a multiple of the native register width, there is
no need to do a follow-up shift after the load. I added new tests for
these cases.
Co-authored-by: Gergely Futo <gergely.futo@hightec-rt.com>
We encountered this problem in Zig, causing all of our
`mips(el)-linux-gnueabi*` tests to fail:
https://github.com/ziglang/zig/issues/21215
For these unusual cases, let's just bail in `MipsFastISel` since
`MipsTargetLowering` can handle them fine.
Note: I don't have commit access.
MIPSr6 has max.s/max.d/min.s/min.d instructions, which can be used as
fcanonicalize.
For pre-R6, we have no instructions that can fcanonicalize an float, so
let's use `fadd Y,X,X` to quiet it if it is NaN.
IEEE754-2008 requires that the result of general-computational and
quiet-computational operation shouldn't be signal NaN.
We need to mask the SRL result to 8 bits before ORing in the SLL. This
is needed in case bits 23:16 of the input aren't zero. They will have
been shifted into bits 15:8.
We don't need to AND the result with 0xffff. It's ok if the upper 16
bits of the register are garbage.
Fixes#103035.
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
If the upper bits of the shr aren't demanded.
This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.
There are some regressions in ARM and Thumb2.
1. Add MipsPat to optimize (andi (srl (truncate i64 $1), x), y) to (andi
(truncate (dsrl i64 $1, x)), y).
2. Add MipsPat to optimize (ext (truncate i64 $1), x, y) to (truncate
(dext i64 $1, x, y)).
The assembly result is the same as gcc.
Fixes https://github.com/llvm/llvm-project/issues/42826
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
This is similar to 373c343a, but for targets with zero-or-negative-one
booleans.
The difference in tests is mostly due to G_SEXT_INREG being illegal for
some targets, in which case it gets expanded into G_SHL/G_ASHR pair,
which is not currently optimized by the combiner.
MIPS max.fmt/min.fmt instructions is IEEE2008 compatiable. If either
argument is sNaN, the result will be NaN.
So we define fminnum_ieee instead of fminnum in Mips32r6InstrInfo.td. We
also should define fcanonicalize. So that we can define fminnum as
expand to fcanonicalize and fminnum_ieee.
MSA registers share the FPRs as its bottom half. So that we can use MSA
instructions to work with normal float/double:
double a, b, c;
asm volatile ("fmadd.d %w0, %w1, %w2" : "+f"(a) : "f"(b), "f"(c));
GCC has support it for quite long time.
Gas uses encoding DW_EH_PE_absptr for PIC, and gnu ld converts it to
DW_EH_PE_sdata4|DW_EH_PE_pcrel.
LLD doesn't have this workarounding, thus complains
```
relocation R_MIPS_32 cannot be used against local symbol; recompile with -fPIC
relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
```
So, let's generates asm/obj files with `DW_EH_PE_sdata4|DW_EH_PE_pcrel`
encoding. In fact, GNU ld supports such OBJs well.
For N64, maybe we should use sdata8, while GNU ld doesn't support it
well, and in fact sdata4 is enough now. So we just ignore the `Large`
for `MCObjectFileInfo::initELFMCObjectFileInfo`. Maybe we should switch
back to sdata8 once GNU LD supports it well.
Fixes: #58377.
- The behavior is similar to UCOMISD on x86, which is also used to
compare two fp values, specifically on handling of NaNs.
- Update related tests regarding this change.
- The further goal is to implement `llvm.minimum` and `llvm.maximum`
intrinsics for MIPS R6 and Pre-R6.
Part of https://github.com/llvm/llvm-project/issues/64207
The current code is so buggy: it can work for few cases. The problems
include:
1. ll/sc works on a whole word, while other parts other than we rmw are
dropped.
2. The oprands are not well zero-extended for unsigned ops.
3. It doesn't work for big-endian, as the postion of subword differs
with little endian.
And in fact, we can set the return value correct in ll/sc scope, so we
can skip the sinkMBB.
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.
These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.
Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
MIPSr6 ISA requires normal load/store instructions support
misunaligned memory access, while it is not always do so
by hardware. On some microarchitectures or some corner cases
it may need support by OS.
Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't
support them, instead, r6 requires lw instruction support
misunaligned memory access. So, if -mstrict-align is used for
pre-R6, lwl/lwr won't be disabled.
If -mstrict-align is used for r6 and the access is not well
aligned, some lb/lh instructions will be used to replace lw.
This is useful for OS kernels.
To be back-compatible with GCC, -m(no-)unaligned-access are also
added as Neg-Alias of -m(no-)strict-align.
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.
- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
This include 2 fixes:
1. Disallow 'f' for softfloat.
2. Allow 'r' for softfloat.
Currently, 'f' is accpeted by clang, then LLVM meets an internal error.
'r' is rejected by LLVM by: couldn't allocate input reg for constraint
'r'.
Fixes: #64241, #63632
---------
Co-authored-by: Fangrui Song <i@maskray.me>
…n MIPS
Modify:
Add a global variable 'CurForbiddenSlotAttr' to save current
instruction's forbidden slot and whether set reorder. This is the
judgment condition for whether to add nop. We would add a couple of
'.set noreorder' and '.set reorder' to wrap the current instruction and
the next instruction.
Then we can get previous instruction`s forbidden slot attribute and
whether set reorder by 'CurForbiddenSlotAttr'.
If previous instruction has forbidden slot and .set reorder is active
and current instruction is CTI. Then emit a NOP after it.
Fix https://github.com/llvm/llvm-project/issues/61045.
Because https://reviews.llvm.org/D158589 was 'Needs Review' state, not
ending, so we commit pull request again.
When parsing the `la` macro, we add a duplicate `$` prefix in
`getOrCreateSymbol`,
leading to `error: Undefined temporary symbol $$yy` for code like:
```
xx:
la $2,$yy
$yy:
nop
```
Remove the duplicate prefix.
In addition, recognize `.L`-prefixed symbols as local for O32.
See: #65020.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
- Enable equivalent between `brcond` and `G_BRCOND`.
- Remove the manual selection of `G_BRCOND` in Mips. Revise test cases.
Reviewers: petar-avramovic, bcardosolopes, arsenm
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/81306
FastISel may create a redundant BGTZ terminal which fallthroughes.
```
BGTZ %2:gpr32, %bb.1, implicit-def $at
bb.1.bb1:
; predecessors: %bb.0
```
The `!I->isBarrier()` check in
MipsAsmPrinter::isBlockOnlyReachableByFallthrough
will incorrectly not print a label, leading to a `Undefined temporary
symbol `
error when we try assembling the output assembly file. See the updated
`Fast-ISel/pr40325.ll` and
https://github.com/rust-lang/rust/issues/108835
In addition, the `SwitchInst` condition is too conservative and prints
many unneeded labels (see the updated tests).
Just use the generic isBlockOnlyReachableByFallthrough, updated by
commit 1995b9fead62f2f6c0ad217bd00ce3184f741fdb for SPARC, which also
handles MIPS.
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
argument goes before the section group name, if a reader doesn't know
that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
them with 'G' is kinda ugly.
Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.
This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.
While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
Do optimization to turn x >> (shift & 31/63) into a single srlv instead
of andi + srlv, since the mips variable shift instruction already
implicitly masks the shift, like x86, wasm and AMDGPU. Copy the
X86DAGToDAGISel::isUnneededShiftMask() function to MIPS for checking
whether need combine two instructions to one.