52796 Commits

Author SHA1 Message Date
Craig Topper
fdb87640ee [LSR][TTI][RISCV] Disable terminator folding for RISC-V.
This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe.

It caused a miscompile in downstream testing.

Spoke with Philip offline. We believe the issue is that LSR needs to
make sure the Step of the other AddRec is non-zero. Reverting until
Philip is back from vacation.
2023-12-27 15:13:32 -08:00
David Green
38c9390b59 [AArch64] Add an extra test for #75822. NFC 2023-12-27 10:40:46 +00:00
Shao-Ce SUN
9f6bf00b25
[DAGCombine] Add DAG optimisation for BF16_TO_FP (#69426)
fold bf16_to_fp(op & 0xffff) -> bf16_to_fp(op)
2023-12-27 17:20:54 +08:00
Min-Yih Hsu
2476e2a911 [M68k] Optimize for overflow arithmetics that will never overflow
We lower overflow arithmetics to its M68kISD counterparts that produce
results of {i16/i32, i8} in which the second resut represents CCR. In
the event where we're certain there won't be an overflow, for instance
8 & 16-bit multiplications, we simply use zero in replacement of the
second result.
This patch replaces M68kISD::CMOV that takes this kind of zero or
all-ones CCR as condition value with its corresponding operand value.
2023-12-26 20:55:23 -08:00
Min-Yih Hsu
6f85075ff7 [M68k] U/SMULd32d16 are not supposed to be communitive
M68k only has 16-bit x 16-bit -> 32-bit variant for multiplications
taking 16-bit operands. We still define two input operands for this
class of instructions, and tie the first operand to the result value.
The problem is that the two operands have different register classes
(DR32 and DR16) hence making these instructions communitive produces
invalid MachineInstr (though the final assembly will still be correct).
2023-12-26 20:55:22 -08:00
Freddy Ye
8ddb0fcff9
[X86] Correct operand order of UWRMSR. (#76389) 2023-12-27 09:01:55 +08:00
Min-Yih Hsu
b80e1acc8c [M68k] Improve codegen of overflow arithmetics
The codegen logic for overflow arithmetics (e.g. llvm.uadd.overflow)
was a mess; overflow multiplications were not even supported.
This patch clean up the legalization of overflow arithmetics and add
supports for common variants of overflow multiplications.
2023-12-26 11:08:11 -08:00
Ivan Kosarev
d51e06c73c
[AMDGPU][True16] Fix the VGPR register class for 16-bit values. (#76170) 2023-12-26 11:34:16 +00:00
Jivan Hakobyan
1d76692cf8
[RISCV][MC] Add support for experimental Zimop extension (#75182)
This implements experimental support for the Zimop extension as
specified here:
https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc.

This change adds only assembly support.

---------

Co-authored-by: ln8-8 <lyut.nersisyan@gmail.com>
Co-authored-by: ln8-8 <73429801+ln8-8@users.noreply.github.com>
2023-12-26 17:21:38 +08:00
Vettel
dc1fadef23
[MCP] Enhance MCP copy Instruction removal for special case(reapply) (#74239)
Machine Copy Propagation Pass may lose some opportunities to further
remove the redundant copy instructions during the ForwardCopyPropagateBlock
procedure. When we Clobber a "Def" register, we also need to remove the record 
from the copy maps that indicates "Src" defined "Def" to ensure the correct semantics
of the ClobberRegister function.  This patch reapplies #70778 and addresses the corner 
case bug  #73512 specific to the AMDGPU backend. Additionally, it refines the criteria 
for removing empty records from the copy maps, thereby enhancing overall safety.

For more information, please see the C++ test case generated code in 
"vector.body" after the MCP Pass: https://gcc.godbolt.org/z/nK4oMaWv5.
2023-12-26 16:22:42 +08:00
Brandon Wu
64e63888dd
Recommit [RISCV] Update the interface of sifive vqmaccqoq (#74284) (#75768)
The

spec(https://sifive.cdn.prismic.io/sifive/60d5a660-3af0-49a3-a904-d2bbb1a21517_int8-matmul-spec.pdf)
is updated.
2023-12-26 12:59:00 +08:00
Kai Luo
5cfc7b3342
[PowerPC] Add test after #75271 on PPC. NFC. (#75616)
Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after
coalescing on PPC.

The case is reduced from failure in #75570. The failure is triggered
after #75271 .
2023-12-26 00:21:56 +08:00
Acim Maravic
48f36c6e74
[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158)
Update DAG ISel to support 64bit versions S_FF1_I32_B64 and
S_FLBIT_I32_B664

---------

Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
2023-12-25 11:55:20 +01:00
Yeting Kuo
af837d44c7
[RISCV][DAG] Teach computeKnownBits consider SEW/LMUL/AVL for vsetvli. (#76158)
This patch also add tests whose masks are too narrow to combine. I think
it can help us to find out bugs caused by too large known bits.
2023-12-25 11:18:22 +08:00
Jim Lin
34727b01eb
[RISCV] Remove +experimental-zfbfmin from the testcases for Zvfbfmin intrinsics. NFC. (#76317)
Zvfbfmin doesn't need Zfbfmin also enabled.
2023-12-25 10:04:55 +08:00
Momchil Velikov
4b6968952e
[AArch64] Implement spill/fill of predicate pair register classes (#76068)
We are getting ICE with, e.g.
```
#include <arm_sve.h>

 void g();
 svboolx2_t f0(int64_t i, int64_t n) {
     svboolx2_t r = svwhilelt_b16_x2(i, n);
     g();
     return r;
 }
```
2023-12-22 15:54:12 +00:00
David Green
48b9106656 [AArch64] Add an strict fp reduction test. NFC 2023-12-22 13:25:00 +00:00
Matt Arsenault
f7c3627338
DAG: Implement promotion for strict_fpextend (#74310)
Test is a placeholder, will be merged into the existing test after
additional bug fixes for illegal f16 targets are fixed.
2023-12-22 17:15:52 +07:00
Matt Arsenault
0e46b49de4 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86.

PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667
2023-12-22 16:46:22 +07:00
wangpc
59eebb40fb [RISCV] Fix macro-fusions.mir 2023-12-22 14:49:59 +08:00
Wang Pengcheng
f9c908862a
[RISCV] Split TuneShiftedZExtFusion (#76032)
We split `TuneShiftedZExtFusion` into three fusions to make them
reusable and match the GCC implementation[1].

The zexth/zextw fusions can be reused by XiangShan[2] and other
commercial processors, but shifted zero extension is not so common.

`macro-fusions-veyron-v1.mir` is renamed so it's not relevant to
specific processor.

References:
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html
[2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode
2023-12-22 14:37:26 +08:00
Matt Arsenault
c7952d8860 AMDGPU: Add a few more bfloat codegen tests 2023-12-22 12:31:42 +07:00
Matt Arsenault
50ed3b1ecc AMDGPU: Workaround a divergent return value bug in test 2023-12-22 12:31:42 +07:00
Vitaly Buka
0ccc1e7acd Revert "[AArch64] Fold more load.x into load.i with large offset"
Issue #76202

This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.
2023-12-21 21:12:40 -08:00
Jonas Paulsson
74a09bd1ec
[SystemZ] Test improvements for atomic load/store instructions (NFC). (#75630)
Improve tests for atomic loads and stores, mainly by testing 128-bit atomic load and store instructions both with and w/out natural alignment.
2023-12-21 20:48:00 +01:00
Arthur Eubanks
7433b1ca3e Reapply "[X86] Set SHF_X86_64_LARGE for globals with explicit well-known large section name (#74381)"
This reverts commit 19fff858931bf575b63a0078cc553f8f93cced20.

Now that explicit large globals are handled properly in the small code model.
2023-12-21 10:51:30 -08:00
Arthur Eubanks
2366d53d8d
[X86] Fix more medium code model addressing modes (#75641)
By looking at whether a global is large instead of looking at the code
model.

This also fixes references to large data in the small code model.

We now always fold any 32-bit offset into the addressing mode with the
large code model since it uses 64-bit relocations.
2023-12-21 10:40:56 -08:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
David Li
f44079db22
[ISel] Add pattern matching for depositing subreg value (#75978)
Depositing value into the lowest byte/word is a common code pattern.
This patch improves the code generation for it to avoid redundant AND
and OR operations.
2023-12-21 10:18:57 -08:00
Craig Topper
0dcff0db3a
[RISCV] Add codegen support for experimental.vp.splice (#74688)
IR intrinsics were already defined, but no codegen support had been
added.

I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
2023-12-21 08:38:32 -08:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Jay Foad
8fdfd34cd2
[AMDGPU] Remove GDS and GWS for GFX12 (#76148) 2023-12-21 15:27:08 +00:00
Tomas Matheson
5992ce90b8 [AArch64] Codegen support for FEAT_PAuthLR
- Adds a new +pc option to -mbranch-protection that will enable
  the use of PC as a diversifier in PAC branch protection code.

- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
  with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
  (pacibsppc, retaasppc, etc) are used.

Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-21 14:18:33 +00:00
stephenpeckham
7026086073
[XCOFF] Use RLDs to print branches even without -r (#74342)
This presents misleading and confusing output. If you have a function
defined at the beginning of an XCOFF object file, and you have a
function call to an external function, the function call disassembles as
a branch to the local function. That is,

`void f() { f(); g();}`

disassembles as 
>00000000 <.f>:
       0: 7c 08 02 a6   mflr 0
4: 94 21 ff c0 stwu 1, -64(1)
       8: 90 01 00 48   stw 0, 72(1)
      c: 4b ff ff f5   bl 0x0 <.f>
      10: 4b ff ff f1   bl 0x0 <.f> 

With this PR, the second call will display:

`10: 4b ff ff f1   bl 0x0 <.g>  `

Using -r can help, but you still get the confusing output:

>10: 4b ff ff f1   bl 0x0 <.f>
      00000010:  R_RBR        .g
2023-12-21 08:17:32 -06:00
Paschalis Mpeis
2e3d77d6ed
[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642)
[TLI] Pass replace-with-veclib works with Scalable Vectors.

The pass is heavily refactored.
It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors.

 Improve tests for ArmPL and SLEEF Intrinsics:
- Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines.
- Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]`
-  Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.
2023-12-21 12:37:57 +00:00
zhongyunde 00443407
f568763641 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2023-12-21 18:54:15 +08:00
zhongyunde 00443407
32878c2065 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917
2023-12-21 18:54:14 +08:00
zhongyunde 00443407
4bad0cb359 [AArch64] Precommit tests for PR75343, NFC 2023-12-21 18:54:14 +08:00
David Green
c0931d4950 [AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.

A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.

Closes #75662
2023-12-21 09:22:23 +00:00
Yeting Kuo
9b561ca044
[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020)
Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.
2023-12-21 15:03:36 +08:00
Brandon Wu
b3769adbc5
[RISCV] Fix wrong lmul for sf_vfnrclip (#76016) 2023-12-21 13:24:26 +08:00
Florian Hahn
b1a5ee1feb
[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527)
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to #73553
2023-12-20 16:56:15 +01:00
Simon Pilgrim
6ec350b483 [X86] SimplifyDemandedVectorEltsForTargetShuffle - don't simplify constant mask if it has multiple uses
Avoid generating extra constant vectors
2023-12-20 15:22:48 +00:00
Hassnaa Hamdi
f3dcc0cba9
[LLVM][AArch64][tblgen]: Match clamp pattern (#75529)
Add isel pattern to replase min(max(v1,v2),v3) by clamp
Add tests for uclamp, sclamp, bfclamp, fclamp.
2023-12-20 14:36:58 +00:00
Matt Arsenault
b01adc6bed AMDGPU: Strengthen some bfloat tests
Fix bitcast test, which was splitting apart phis intended to force
bitcasts that survive all the way to selection.

Disable the amdgpu-codegenprepare phi splitting, which defeats the technique
of using a phi to ensure a bitcast reaches all the way to selection. Also
add a variety of bfloat tests. These probably need revisiting to avoid the
cast folding into argument loads. Also round out set of bfloat bitcast and
ABI tests.

Add codegen tests for more bf16 operations The promotion of these works
contrary to the comment.
2023-12-20 19:33:45 +07:00
Matt Arsenault
9e574a3936 DAG: Fix expansion of bf16 sourced extloads
Also fix assorted vector extload failures for AMDGPU.
2023-12-20 19:24:27 +07:00
Nikita Popov
bbe6c81f80 [RISCV] Add missing REQUIRES asserts to test (NFC) 2023-12-20 09:42:14 +01:00
Yeting Kuo
b7376c3196
[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014)
performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat
(frint X)).
2023-12-20 14:56:28 +08:00
Mariusz Sikora
9a41a80e76
[AMDGPU] Handle object size and bail if assume-like intrinsic is used in PromoteAllocaToVector (#68744)
Attached test will cause crash without this change.

We should not remove isAssumeLikeIntrinsic instruction if it is used by
other instruction.
2023-12-20 07:47:49 +01:00
Brandon Wu
fb51aae702
[RISCV] Add missing lmul info for SiFive extensions (#76006) 2023-12-20 14:42:47 +08:00