1786 Commits

Author SHA1 Message Date
Hervé Poussineau
6fa1647a47
[MC][Mips] Rename MipsMCAsmInfo to MipsELFMCAsmInfo (#112592)
Also change MipsAsmPrinter::emitStartOfAsmFile to emit ELF-related
sections only when using ELF output file format.
2024-11-01 08:42:34 +08:00
Vladimir Radosavljevic
401d123a1f
[MCP] Optimize copies when src is used during backward propagation (#111130)
Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
2024-10-23 13:37:02 +02:00
Alex Rønne Petersen
5785cbb405
[llvm] Ensure that soft float targets don't emit fma() libcalls. (#106615)
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.

Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.

Note: I don't have commit access.
2024-10-19 06:13:15 -07:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
Nikita Popov
9f81acf4ef [Mips] Regenerate test checks (NFC)
Some of these check lines are insufficient to determine correctness.
Generate full check lines instead.

To reduce noise, add nounwind and use static relocation model.
2024-10-01 14:49:14 +02:00
yingopq
debc325bb1
[MIPS] Fix failing to legalize load+call with vector of non-p2 integer (#109625)
Add a condition to check whether the vector element type is a power of 2.

Fixes #102870.
2024-09-24 09:38:38 +02:00
yingopq
677177bb60
[Mips] Fix mfhi/mflo hazard miscompilation about div and mult (#91449)
Fix issue1: In mips1-4, require a minimum of 2 instructions between a
mflo/mfhi and the next mul/dmult/div/ddiv/divu/ddivu instruction.
Fix issue2: In mips1-4, should not put mflo into the delay slot for the
return.

Fix https://github.com/llvm/llvm-project/issues/81291
2024-09-23 19:07:13 +08:00
futog
3e0a76b1fd
[Codegen][LegalizeIntegerTypes] Improve shift through stack (#96151)
Minor improvement on cc39c3b17fb2598e20ca0854f9fe6d69169d85c7.

Use an aligned stack slot to store the shifted value.
Use the native register width as shifting unit, so the load of the
shift result is aligned.

If the shift amount is a multiple of the native register width, there is
no need to do a follow-up shift after the load. I added new tests for
these cases.

Co-authored-by: Gergely Futo <gergely.futo@hightec-rt.com>
2024-09-23 11:45:43 +02:00
yingopq
72cacf1d99
[MIPS] Fix -msingle-float doesn't work with double on O32 (#107543)
Skip the following function 'CustomLowerNode' when the operand had done
`SoftenFloatResult`.

Fix #93052
2024-09-20 07:37:18 +08:00
anbbna
b847076f55
[Mips] Add test file for 'xor' and 'and' instructions (#106679)
Part of #99783 

This test is meant to reflect the oncoming change as this test shows the
unoptimized result with unnecessary SLLs.
2024-09-20 07:34:38 +08:00
yingopq
1ad84d7961
[Mips] Optimize or (and $src1, mask), (shl $src2, shift) to ins (#103017)
Optimize `$dst = or (and $src1, (2**size0 - 1)), (shl $src2, size0)` to
`ins $src1, $src2, pos, size`,
where `pos = size0, size = 32 - pos`.

Fix #90325
2024-09-13 00:05:54 +08:00
Alex Rønne Petersen
c0b3e491cc
[llvm][Mips] Bail on underaligned loads/stores in FastISel. (#106231)
We encountered this problem in Zig, causing all of our
`mips(el)-linux-gnueabi*` tests to fail:
https://github.com/ziglang/zig/issues/21215

For these unusual cases, let's just bail in `MipsFastISel` since
`MipsTargetLowering` can handle them fine.

Note: I don't have commit access.
2024-09-12 22:10:19 +08:00
YunQiang Su
c641b611f8
MIPSr6: Add llvm.is.fpclasss intrinsic support (#107857)
MIPSr6 has class.s/class.d instructions.
Let's use them for llvm.is.fpclass intrinsic.
2024-09-11 09:37:12 +08:00
YunQiang Su
1e153461c6
MIPS: Add fcanonicalize for pre-R6 (#104554)
MIPSr6 has max.s/max.d/min.s/min.d instructions, which can be used as
fcanonicalize.

For pre-R6, we have no instructions that can fcanonicalize an float, so
let's use `fadd Y,X,X` to quiet it if it is NaN.

IEEE754-2008 requires that the result of general-computational and
quiet-computational operation shouldn't be signal NaN.
2024-08-27 17:13:46 +08:00
Craig Topper
ebe7265b14
[Mips] Fix fast isel for i16 bswap. (#103398)
We need to mask the SRL result to 8 bits before ORing in the SLL. This
is needed in case bits 23:16 of the input aren't zero. They will have
been shifted into bits 15:8.

We don't need to AND the result with 0xffff. It's ok if the upper 16
bits of the register are garbage.

Fixes #103035.
2024-08-16 14:54:51 -07:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
Craig Topper
abc1acf8df
[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)
If the upper bits of the shr aren't demanded.

This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.

There are some regressions in ARM and Thumb2.
2024-08-14 08:44:57 -07:00
Craig Topper
91c3a718b2
[Mips] ISel zext nneg the same as sext for Mips64. (#102852)
Fixes #62587.
2024-08-12 13:47:27 -07:00
yingopq
e711a0c80f
[MIPS] Fix missing ANDI optimization (#97689)
1. Add MipsPat to optimize (andi (srl (truncate i64 $1), x), y) to (andi
(truncate (dsrl i64 $1, x)), y).
2. Add MipsPat to optimize (ext (truncate i64 $1), x, y) to (truncate
(dext i64 $1, x, y)).

The assembly result is the same as gcc.

Fixes https://github.com/llvm/llvm-project/issues/42826
2024-08-09 18:55:21 +01:00
yingopq
5fb20024e2
[Mips] Add test for AND optimization (#102278)
See https://github.com/llvm/llvm-project/issues/42826
2024-08-07 20:55:13 +01:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Thorsten Schütt
b1f9440fa9
[GlobalIsel] Import GEP flags (#93850)
https://github.com/llvm/llvm-project/pull/90824
2024-06-14 20:56:43 +02:00
Nikita Popov
deab451e7a
[IR] Remove support for icmp and fcmp constant expressions (#93038)
Remove support for the icmp and fcmp constant expressions.

This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179

As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
2024-06-04 08:31:03 +02:00
paperchalice
9b0e1c2ca2
[NewPM][CodeGen] Port finalize-isel to new pass manager (#94214)
It should preserve more analysis results, but it happens immediately
after instruction selection.
2024-06-04 09:23:52 +08:00
Sergei Barannikov
3fee8b3469
[GISel] LegalizationArtifactCombiner: Elide redundant G_SEXT_INREG (#93687)
This is similar to 373c343a, but for targets with zero-or-negative-one
booleans.
The difference in tests is mostly due to G_SEXT_INREG being illegal for
some targets, in which case it gets expanded into G_SHL/G_ASHR pair,
which is not currently optimized by the combiner.
2024-05-30 12:40:42 +03:00
YunQiang Su
0bf181eb34
MIPS: Fix llvm.{min,max}num for R6 (#93125)
MIPS max.fmt/min.fmt instructions is IEEE2008 compatiable. If either
argument is sNaN, the result will be NaN.

So we define fminnum_ieee instead of fminnum in Mips32r6InstrInfo.td. We
also should define fcanonicalize. So that we can define fminnum as
expand to fcanonicalize and fminnum_ieee.
2024-05-23 22:27:17 +08:00
YunQiang Su
eac743d1b0
MIPS: Support '%w' token in inline asm template for MSA (#91920)
MSA registers share the FPRs as its bottom half. So that we can use MSA
instructions to work with normal float/double:
   double a, b, c;
   asm volatile ("fmadd.d %w0, %w1, %w2" : "+f"(a) : "f"(b), "f"(c));

GCC has support it for quite long time.
2024-05-20 14:46:47 +08:00
YunQiang Su
8f21294897
MIPS: Use pcrel|sdata4 for eh_frame (#91291)
Gas uses encoding DW_EH_PE_absptr for PIC, and gnu ld converts it to
DW_EH_PE_sdata4|DW_EH_PE_pcrel.
LLD doesn't have this workarounding, thus complains
```
  relocation R_MIPS_32 cannot be used against local symbol; recompile with -fPIC
  relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
```

So, let's generates asm/obj files with `DW_EH_PE_sdata4|DW_EH_PE_pcrel`
encoding. In fact, GNU ld supports such OBJs well.

For N64, maybe we should use sdata8, while GNU ld doesn't support it
well, and in fact sdata4 is enough now. So we just ignore the `Large`
for `MCObjectFileInfo::initELFMCObjectFileInfo`. Maybe we should switch
back to sdata8 once GNU LD supports it well.

Fixes: #58377.
2024-05-08 17:30:14 +08:00
Cinhi Young
715219482b
[MIPS] match llvm.{min,max}num with {min,max}.fmt for R6 (#89021)
- The behavior is similar to UCOMISD on x86, which is also used to
compare two fp values, specifically on handling of NaNs.
- Update related tests regarding this change.
- The further goal is to implement `llvm.minimum` and `llvm.maximum`
intrinsics for MIPS R6 and Pre-R6.

Part of https://github.com/llvm/llvm-project/issues/64207
2024-04-27 15:53:02 +08:00
yingopq
e1aa16299f
[Mips] Use ANDi in for zero-extend in subword atomic umax/umin for both r2 and pre-R2 (#89881)
About unsigned max/min, ANDi is available for all ISA revisions in
extend before slt insn.
So that we can reduce one instruction.
2024-04-24 22:31:51 +08:00
YunQiang Su
758d97dce0
[MIPS]: Rework atomic max/min expand for subword (#89575)
The current code is so buggy: it can work for few cases. The problems
include:
1. ll/sc works on a whole word, while other parts other than we rmw are
dropped.
  2. The oprands are not well zero-extended for unsigned ops.
3. It doesn't work for big-endian, as the postion of subword differs
with little endian.

And in fact, we can set the return value correct in ll/sc scope, so we
can skip the sinkMBB.
2024-04-23 02:08:12 +08:00
Shilei Tian
3a106e5b2c
[GlobalISel] Fold G_ICMP if possible (#86357)
This patch tries to fold `G_ICMP` if possible.
2024-03-29 15:59:50 -04:00
Wang Pengcheng
610b9e23c5
[SDAG] Use shifts if ISD::MUL is illegal when lowering ISD::CTPOP (#86505)
We can avoid libcalls.

Fixes #86205
2024-03-29 15:38:39 +08:00
Simon Pilgrim
5b544b511c [Mips] ctpop.mir - regenerate checks to improve codegen diff in #86505 2024-03-26 10:43:29 +00:00
yingopq
5d7fd6a04a
[Mips] Restore wrong deletion of instruction 'and' in unsigned min/max processing. (#85902)
Fix #61881
2024-03-24 02:35:42 -04:00
Evgenii Kudriashov
d365a45cb3
[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.

These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.

Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
2024-03-23 13:12:44 +01:00
YunQiang Su
d7e28cd82b
MIPS: Support -m(no-)unaligned-access for r6 (#85174)
MIPSr6 ISA requires normal load/store instructions support
misunaligned memory access, while it is not always do so
by hardware. On some microarchitectures or some corner cases
it may need support by OS.

Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't
support them, instead, r6 requires lw instruction support
misunaligned memory access. So, if -mstrict-align is used for
pre-R6, lwl/lwr won't be disabled.

If -mstrict-align is used for r6 and the access is not well
aligned, some lb/lh instructions will be used to replace lw.
This is useful for OS kernels.

To be back-compatible with GCC, -m(no-)unaligned-access are also
added as Neg-Alias of -m(no-)strict-align.
2024-03-20 14:18:24 +08:00
Jonas Paulsson
09bc6abba6
[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001)
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.

- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
2024-03-18 10:37:59 -04:00
yingopq
755b439694
[Mips] Fix missing sign extension in expansion of sub-word atomic max (#77072)
Add sign extension "SEB/SEH" before compare.

Fix #61881
2024-03-08 15:41:31 -05:00
YunQiang Su
c88beb4112
MIPS: Fix asm constraints "f" and "r" for softfloat (#79116)
This include 2 fixes:
        1. Disallow 'f' for softfloat.
        2. Allow 'r' for softfloat.

Currently, 'f' is accpeted by clang, then LLVM meets an internal error.

'r' is rejected by LLVM by: couldn't allocate input reg for constraint
'r'.

Fixes: #64241, #63632

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2024-02-26 22:08:36 -08:00
yingopq
96abee5eef
[Mips] Fix unable to handle inline assembly ends with compat-branch o… (#77291)
…n MIPS

Modify:
Add a global variable 'CurForbiddenSlotAttr' to save current
instruction's forbidden slot and whether set reorder. This is the
judgment condition for whether to add nop. We would add a couple of
'.set noreorder' and '.set reorder' to wrap the current instruction and
the next instruction.
Then we can get previous instruction`s forbidden slot attribute and
whether set reorder by 'CurForbiddenSlotAttr'.
If previous instruction has forbidden slot and .set reorder is active
and current instruction is CTI. Then emit a NOP after it.

Fix https://github.com/llvm/llvm-project/issues/61045.

Because https://reviews.llvm.org/D158589 was 'Needs Review' state, not
ending, so we commit pull request again.
2024-02-24 15:13:43 +08:00
YunQiang Su
c007fbb198
MipsAsmParser/O32: Don't add redundant $ to $-prefixed symbol in the la macro (#80644)
When parsing the `la` macro, we add a duplicate `$` prefix in
`getOrCreateSymbol`,
leading to `error: Undefined temporary symbol $$yy` for code like:

```
xx:
	la	$2,$yy
$yy:
	nop
```

Remove the duplicate prefix.

In addition, recognize `.L`-prefixed symbols as local for O32.

See: #65020.

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2024-02-14 12:48:55 -08:00
darkbuck
d0f4663f48
[GlobalISel][Mips] Global ISel for brcond
- Enable equivalent between `brcond` and `G_BRCOND`.
- Remove the manual selection of `G_BRCOND` in Mips. Revise test cases.

Reviewers: petar-avramovic, bcardosolopes, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/81306
2024-02-10 21:44:05 -05:00
Fangrui Song
6b2fd7aed6
[MIPS] Use generic isBlockOnlyReachableByFallthrough (#80799)
FastISel may create a redundant BGTZ terminal which fallthroughes.
```
  BGTZ %2:gpr32, %bb.1, implicit-def $at

bb.1.bb1:
; predecessors: %bb.0
```

The `!I->isBarrier()` check in
MipsAsmPrinter::isBlockOnlyReachableByFallthrough
will incorrectly not print a label, leading to a `Undefined temporary
symbol `
error when we try assembling the output assembly file. See the updated
`Fast-ISel/pr40325.ll` and
https://github.com/rust-lang/rust/issues/108835

In addition, the `SwitchInst` condition is too conservative and prints
many unneeded labels (see the updated tests).

Just use the generic isBlockOnlyReachableByFallthrough, updated by
commit 1995b9fead62f2f6c0ad217bd00ce3184f741fdb for SPARC, which also
handles MIPS.
2024-02-06 09:23:33 -08:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Quentin Dian
112fba974c
[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147)
Per #80143, we can remove the extra line break when there is no
instruction.
2024-02-01 22:10:52 +08:00
Fangrui Song
f972e4d343 [MC,ELF] .section: unconditionally print section flag 'G' after 'o'
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
  argument goes before the section group name, if a reader doesn't know
  that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
  them with 'G' is kinda ugly.

Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
2024-01-09 10:48:23 -08:00
Fangrui Song
7620f03ef7
[MC] Parse SHF_LINK_ORDER argument before section group name (#77407)
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381
https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.

This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.

While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
2024-01-09 10:42:34 -08:00
yingopq
e13e95bc44
[Mips] Optimize (shift x (and y, BitWidth - 1)) to (shift x, y) (#73889)
Do optimization to turn x >> (shift & 31/63) into a single srlv instead
of andi + srlv, since the mips variable shift instruction already
implicitly masks the shift, like x86, wasm and AMDGPU. Copy the
X86DAGToDAGISel::isUnneededShiftMask() function to MIPS for checking
whether need combine two instructions to one.
2023-12-29 14:53:55 +05:30