52796 Commits

Author SHA1 Message Date
Joe Nash
ef79d9e38e [AMDGPU][NFC] Regenerate CHECKs as pre-commit for D157426 2023-08-11 09:55:59 -04:00
Paul Walker
ac2a7637fe [SVE] Add test to show incorrect code generation for scalable vector struct loads and stores.
Patch also includes a minor fix to AArch64::isLegalAddressingMode
to ensure all scalable types have a suitable bailout.
2023-08-11 13:35:04 +00:00
Simon Pilgrim
6c119cff31 [X86] combineConcatVectorOps - extend PACKSS/PACKUS handling to 512-bit nodes on BWI targets.
Fixes another TRUNCATE -> PACKSS/PACKUS regression when #63710 finally gets fixed
2023-08-11 13:25:24 +01:00
Simon Pilgrim
0464a8f4a4 [X86] Add tests showing failure to concat(pack(),pack()) 512-bit results on BWI targets 2023-08-11 13:15:06 +01:00
Matt Arsenault
29fff3e2ab AMDGPU: Try to select fmul by power of 2 to ldexp
For the f64 case, this gives us a cheaper to materialize 32-bit
constant. It's less obviously a win for f32 and f16. It forces us to
use a VOP3 encoding so it's a neutral code size change.

GlobalISel cases don't work because of the constant-is-copy-to-vgpr
problem.

https://reviews.llvm.org/D157111
2023-08-11 07:57:55 -04:00
Matt Arsenault
c8a4f2a8c1 AMDGPU: Add baseline tests for fmul-to-ldexp patterns
We can better some multiply-by-power-of-2 patterns as ldexp.
2023-08-11 07:57:55 -04:00
Anatoly Trosinenko
81300f75f4 [AArch64][PAC] Remove the duplication of LR sign/auth implementations
In the machine outliner implementation for AArch64, `signOutlinedFunction()`
reimplements signing the LR value in prologue and authenticating it in
epilogue of the outlined function. This patch factors out `signLR()` and
`authenticateLR()` functions from AArch64FrameLowering code and reuses
them in `signOutlinedFunction()`.

The `mergeOutliningCandidateAttributes()` outliner callback is
introduced as well to further unify signing and authentication of the LR
value.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D157320
2023-08-11 14:39:18 +03:00
David Green
7720b9a7e8 [AArch64] Extend and cleanup vecreduce.fmin/max tests. NFC
See D156614 and D156615. This extends and uniforms the types tested in
vecreduce min/max tests to make them more useful to GlobalISel.
2023-08-11 12:16:22 +01:00
Simon Pilgrim
14d1e502df [X86] combineConcatVectorOps - fold a 512-bit splat of a 128-bit subvector to a single X86ISD::SHUF128 node.
Replaces a pair of insert_subvectors with a single (implicitly widened) vector - also reduce uses of the src.

Hopefully this should address most of the remaining widen subvector regressions I'm seeing while trying to aggressively convert TRUNCATE to PACKSS/PACKUS.
2023-08-11 12:14:02 +01:00
pvanhout
14cfe92975 [AArch64][GlobalISel] Regenerate select combine tests
Will be modified in D157690
2023-08-11 12:45:09 +02:00
Stanislav Mekhanoshin
02046ad944 [AMDGPU] W/a for gfx940 byte0 fp8 conversion bug
VOP1 form of these do not work.

Differential Revision: https://reviews.llvm.org/D157683
2023-08-11 02:21:21 -07:00
David Green
acd17ea662 [AArch64][GISel] Expand handling for G_FSQRT to more vector types
Similar to G_FABS, these can reuse the existing lowering to successfully handle
more types.
2023-08-11 10:16:45 +01:00
Simon Pilgrim
ef46046060 [X86] combineConcatVectorOps - add handling for X86ISD::VPERM2X128 nodes.
On AVX512 targets we can concatenate these and create a X86ISD::SHUF128 node.

Prevents regression on some future work to improve codegen for concat_vectors(extract_subvector(),extract_subvector()) (mainly via vector widening) patterns.
2023-08-11 10:01:13 +01:00
Lawrence Benson
c7b537bf09 [AArch64] Add more efficient vector bitcast for v16i8
We previously split the vector into two halves and performed two vector reduce operations followed by bit shifting and bitwise or. Now, we use NEON's zip1 to concatenate
the halves in a smart way and then perform only a single vector reduce. This boosts performance quite a bit for this small routine, as vector reduce is a rather expensive
intruction. Original discussion for this started in: https://reviews.llvm.org/D145301

Differential Revision: https://reviews.llvm.org/D156544
2023-08-11 10:10:42 +02:00
Nikita Popov
59d558a378 [X86] Add test for PR64589 (NFC) 2023-08-11 09:52:25 +02:00
pvanhout
490a867f16 [GlobalISel] Also set dead flags of implicit defs added by BuildMI
BuildMI automatically adds the implicit operands of the
instruction. This meant we couldn''t set the dead flag on
dead implicit defs in that case.

Fix it by introducing an opcode to mark a given implicit
def as dead.

Fixes #64565

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157515
2023-08-11 08:38:37 +02:00
pvanhout
89e91e4c0c [AMDGPU] Remove post-PromoteAlloca SROA run
PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore.

Internal testing shows no noticeable performance impact.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D156398
2023-08-11 08:29:21 +02:00
Yeting Kuo
69cc5a4e1a [LegalizeTypes] Support promotion for vp bitmanip sdnodes.
This support promotion for vp.bitreverse/bswap/ctlz/ctlz_zero_undef/cttz/cttz_zero_undef/ctpop/fshr/fshl.

Reviewed By: craig.topper, luke

Differential Revision: https://reviews.llvm.org/D157607
2023-08-11 08:27:42 +08:00
Eduard Zingerman
e66affa17e Revert "[BPF] support for BPF_ST instruction in codegen"
This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf.

Reverting to investigate buildbot failure reported in [1].

    field-reloc-st-imm.ll:
    *** Bad machine code: Explicit definition must be a register ***
    - function:    bar
    - basic block: %bb.0 entry (0x742f318)
    - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ...
    - operand 0:   3
    *** Bad machine code: Explicit definition must be a register ***
    - function:    bar
    - basic block: %bb.0 entry (0x742f318)
    - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ...
    - operand 0:   4
    LLVM ERROR: Found 4 machine code errors.

[1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877
2023-08-11 02:23:40 +03:00
Eduard Zingerman
92e28e397d [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-11 02:07:29 +03:00
Matt Arsenault
7575ee7167 AMDGPU: Add more test coverage for FP-typed atomicrmw xchg 2023-08-10 17:38:25 -04:00
Matt Arsenault
c8cac15613 PreISelIntrinsicLowering: Check RuntimeLibcalls instead of TLI for memory functions
We need a better mechanism for expressing which calls you are allowed
to emit and which calls are recognized. This should be applied to the
17 branch.
2023-08-10 16:40:04 -04:00
Changpeng Fang
1e22873ef4 [AMDGPU][NFC] Rename two LIT test files 2023-08-10 11:31:14 -07:00
Craig Topper
2df9328fe3 [RISCV] Stop performFP_TO_INTCombine from folding with ISD::FRINT.
FRINT was added to matchRoundingOp after this function was written.
So FRINT was not tested originally.

For vectors, folding this causes us to create a CSR swap that tries
to write 7 to FRM. This is an illegal value and will cause the CSR
write to fail.

While this might be a legal fold we could do, I'm disabling it for
now so we can backport to LLVM 17 with the least risk.

Differential Revision: https://reviews.llvm.org/D157583
2023-08-10 09:30:36 -07:00
Philip Reames
b1ada7a1d3 [DAG] Support store merging of vector constant stores (try 2)
Original commit didn't handle the case where one of the stores was a
truncating store of the build_vector.  The existing codepath produced
wrong code (which thankfully also failed asserts) instead of guarding
against unexpected types.  Original commit message follows..

Ran across this when making a change to RISCV memset lowering. Seems
very odd that manually merging a store into a vector prevents it from
being further merged.

Differential Revision: https://reviews.llvm.org/D156349
2023-08-10 08:54:05 -07:00
Philip Reames
e838471bc4 [X86] Add regression test case from pr64593
This is the case which triggered the revert of 660b740.  Note that the test is extremely fragile as it depends on getting a truncating store at the right moment rather than folding the constant to a narrower bitwidth.  This appears to happen on skylake, but not e.g. plain avx.
2023-08-10 08:49:41 -07:00
Patrick O'Neill
fcad2bbcfc [RISC-V] Add proposed mapping for Ztso
Currently LLVM emits Ztso code for fences, loads, and stores (behind an
experimental flag) [1]. This patch updates the mapping and implements
support for LR/SC and AMO ops. This updated mapping is compatible with
the RVWMO ABI present in the psABI. Additional context can be found in
the psABI pull request [2].

[1] https://reviews.llvm.org/D143076
[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391

Differential Revision: https://reviews.llvm.org/D155517
2023-08-10 15:59:06 +01:00
Philip Reames
0696a531c2 Revert "[DAG] Support store merging of vector constant stores"
This reverts commit 660b740e4b3c4b23dfba36940ae0fe2ad41bfedf.  Crash reported in the review thread post commit.  Reverting while investigating.
2023-08-10 07:58:00 -07:00
Nabeel Omer
d43634cd74 [X86] Pre-commit test for D157513
https://reviews.llvm.org/D157513
2023-08-10 15:40:12 +01:00
Luke Lau
b165a7779d [RISCV] Remove completed FIXME. NFC
Looks like this FIXME was already taken off during the original patch in
https://reviews.llvm.org/D104921
2023-08-10 15:31:05 +01:00
Sean Fertile
b37c7ed0c9 [PPC][AIX] Fix toc-data peephole bug and some related cleanup.
Set the ReplaceFlags variable to false, since there is code meant only
for the ADDItocHi/ADDItocL nodes. This has the side effect of disabling
the peephole when the load/store instruction has a non-zero offset.
This patch also fixes retrieving the `ImmOpnd` node from the AIX small
code model pseduos and does the same for the register operand node.
This allows cleaning up the later calls to replaceOperands.
Finally move calculating the MaxOffset into the code guarded by
ReplaceFlags as it is only used there and the comment is specific to the ELF
ABI.

Fixes https://github.com/llvm/llvm-project/issues/63927

Differential Revision: https://reviews.llvm.org/D155957
2023-08-10 10:23:15 -04:00
Jay Foad
3091bdb86d [AMDGPU] Do not release VGPRs at -O0
This was an oversight when the GFX11 early release VGPRs optimization
was reimplemented in D153279.

Sending the DEALLOC_VGPRS message is a performance optimization so there
is no need to do it at -O0. In addition it makes some kinds of post
mortem debugging hard or impossible, since VGPR values are no longer
available to inspect at the s_endpgm instruction.

Differential Revision: https://reviews.llvm.org/D157599
2023-08-10 14:58:06 +01:00
Simon Pilgrim
4ed452b747 [X86] getFauxShuffleMask - handle insert_subvector(src, bitcast(extract_subvector(sub))) patterns
Add bitcast handling to the existing insert_subvector(src, extract_subvector(sub)) pattern, and recognise undef src cases to allow us to detect vector widening patterns.
2023-08-10 13:38:38 +01:00
Paul Walker
3d65f8211f [SVE] Expand scalable vector ISD::BITCASTs when targeting big-endian.
Whilst sub-optimial, it's better than the current selection failure.

Fixes: #64406

Differential Revision: https://reviews.llvm.org/D157406
2023-08-10 11:02:01 +00:00
Jianjian GUAN
8901eb281f [RISCV] Fix zihintntl test 2023-08-10 17:18:17 +08:00
David Green
c26459258a [AArch64] Update check lines in neon-compare-instructions.ll
-global-isel-abort=2 is no longer required, and many of the tests can now
shared CHECK lines between SDAG and GlobalISel.
2023-08-10 10:09:13 +01:00
Jianjian GUAN
f808788487 [RISCV] Remove experimental for zihintntl
Since zihintntl is ratified now, we could remove the experimental prefix and change its version to 1.0.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D151547
2023-08-10 17:04:49 +08:00
Neumann Hon
3e139be29f [SystemZ][z/OS] Add support for function name field of PPA1
This PR causes the PPA1 to emit the function's name if it exists. This field is not emitted for unnamed functions.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D157494
2023-08-10 04:40:19 -04:00
David Green
b720dcba92 [AArch64][GISel] Split large f64 vectors for fcmp.
This adds some very basic f64 handling for larger fcmp vectors, which seemed to
be missing.
2023-08-10 08:19:22 +01:00
Yunze Zhu
5f73d2b780 [RISCV] Enable alias analysis by default
In llvm alias analysis is off by default now.
This patch enable alias analysis on RISCV target during code generation by default,
and this makes more chances for improving performance.
Modified related test cases.

Differential Revision: https://reviews.llvm.org/D157250
2023-08-10 10:48:43 +08:00
Matt Arsenault
6dbd458128 AMDGPU: Remove pointless libcall optimization of fma/mad
After the library is linked and trivially inlined, the generic fma and
fmuladd intrinsics already handle these cases, and with precise flag
handling. This was requiring all fast math flags when we really just
need nsz for the fma(a, b, 0) case.

https://reviews.llvm.org/D156677
2023-08-09 19:37:52 -04:00
Matt Arsenault
6448d5ba58 AMDGPU: Remove pointless libcall recognition of native_{divide|recip}
This was trying to constant fold these calls, and also turn some of
them into a regular fmul/fdiv. There's no point to doing that, the
underlying library implementation should be using those in the first
place. Even when the library does use the rcp intrinsics, the backend
handles constant folding of those. This was also only performing the
folds under overly strict fast-evertyhing-is-required conditions.

The one possible plus this gained over linking in the library is if
you were using all fast math flags, it would propagate them to the new
instructions. We could address this in the library by adding more fast
math flags to the native implementations.

The constant fold case also had no test coverage.

https://reviews.llvm.org/D156676
2023-08-09 18:48:46 -04:00
Matt Arsenault
58e87c961e AMDGPU: Port AMDGPULowerKernelArguments to new pass manager
https://reviews.llvm.org/D157498
2023-08-09 18:34:30 -04:00
Matt Arsenault
1ca0808db2 GlobalISel: Don't expand stacksave/stackrestore in IRTranslator
In some (likely invalid edge cases anyway), it's not correct to
directly copy the stack pointer register.
2023-08-09 18:33:55 -04:00
Matt Arsenault
25bc999d1f Intrinsics: Add type overload to stacksave and stackstore
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.

https://reviews.llvm.org/D156666
2023-08-09 18:33:11 -04:00
priyanshi1708
b16a0f9f6e [AArch64][Optimization]Emit FCCMP for AND of two float compares
Transforms and(fcmp(a, b), fcmp(c, d)) into fccmp(fcmp(a, b), c, d)
Issue link: https://github.com/llvm/llvm-project/issues/60819

Differential Revision: https://reviews.llvm.org/D152714
2023-08-09 15:58:04 +01:00
Paul Walker
b7e6e568b4 [SelectionDAG] Fix problematic call to EVT::changeVectorElementType().
The function changeVectorElementType assumes MVT input types will
result in MVT output types.  There's no gurantee this is possible
during early code generation and so this patch converts an instance
used during initial DAG construction to instead explicitly create a
new EVT.

NOTE: I could have added more MVTs, but that seemed unscalable as
you can either have MVTs with 100% element count coverage or 100%
bitwidth coverage, but not both.

Differential Revision: https://reviews.llvm.org/D157392
2023-08-09 12:50:02 +00:00
Matt Devereau
175850f987 [AArch64][SVE2] Combine trunc+add+lsr to rshrnb
The example sequence

  add z0.h, z0.h, #32
  lsr z0.h, #6
  st1b z0.h, x1

can be replaced with

  rshrnb z0.b, #6
  st1b z0.h, x1

As the top half of the destination elements are truncated.

In similar fashion,

  add z0.s, z0.s, #32
  lsr z1.s, z1.s, #6
  add z1.s, z1.s, #32
  lsr z0.s, z0.s, #6
  uzp1 z0.h, z0.h, z1.h

Can be replaced with

  rshrnb z1.h, z1.s, #6
  rshrnb z0.h, z0.s, #6
  uzp1 z0.h, z0.h, z1.h

Differential Revision: https://reviews.llvm.org/D155299
2023-08-09 12:49:42 +00:00
Quentin Colombet
bb206cb131 [NVPTX] Apply global var demotion to private symbols
When emitting the assembly we perform some late global variables demotion.
Prior to this patch, this optimization was only performed on variables with
the internal linkage whereas any local global variable can be demoted.

Fix that by using `hasLocalLinkage` instead of `hasInternalLinkage`.

Without this change, global variables with the `private` linkage wouldn't
be demoted.

Differential Revision: https://reviews.llvm.org/D154507
2023-08-09 14:41:01 +02:00
Sander de Smalen
ecb7b9c5c5 [Clang][AArch64] Diagnostics for SME attributes when target doesn't have 'sme'
This patch adds error diagnostics to Clang when code uses the AArch64 SME
attributes without specifying 'sme' as available target attribute.

* Function definitions marked as '__arm_streaming', '__arm_locally_streaming',
  '__arm_shared_za' or '__arm_new_za' will by definition use or require SME
  instructions.
* Calls from non-streaming functions to streaming-functions require
  the compiler to enable/disable streaming-SVE mode around the call-site.

In some cases we can accept the SME attributes without having 'sme' enabled:
* Function declaration can have the SME attributes.
* Definitions can be __arm_streaming_compatible since the generated
  code should execute on processing elements without SME.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D157269
2023-08-09 12:31:02 +00:00