52796 Commits

Author SHA1 Message Date
Luke Lau
18013bea46 [RISCV] Add tests for unaligned segmented loads and stores
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154535
2023-07-07 15:34:22 +01:00
Matt Arsenault
94e24624c2 AMDGPU: Remove attempt at simplifying the format string in printf lowering
This avoids computing the dominator tree by removing the
simplifyInstruction use.

This was applying simplification with some kind of questionable
load-store forwarding and looking for the global. This had to have
been an ancient hack copied from previous backends. In the OpenCL
case, this is always emitted as required the direct global reference
anyway.
2023-07-07 09:26:07 -04:00
Lucas Prates
54c7aec449 [AArch64][RCPC3] Instruction selection for LDAP1/STL1 instructions
This implements the DAG patterns to enable instruction selection for the
LDAP1 and STL1 instructions from FEAT_LRCPC3. The instructions should
match the following combinations:

* Aqcuiring atomic load + vector insert element for LDAP1.
* Vector extract element + releasing atomic store for STL1.

Patterns have also been added to cope with the DAG structure found when
dealing with 1-lane sub-vectors.

Reviewed By: tmatheson, efriedma

Differential Revision: https://reviews.llvm.org/D153129
2023-07-07 12:32:56 +01:00
WuXinlong
c0221e006d [RISCV] Add a pass to combine cm.pop and ret insts
`RISCVPushPopOptimizer.cpp` combine `cm.pop` and `ret` to generates `cm.popretz` or `cm.popret` .

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D150416
2023-07-07 14:04:11 +08:00
Jim Lin
43927542d8 [RISCV] Rename prefix fixed-vector to fixed-vectors to be the same with other testcases. NFC.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154679
2023-07-07 13:04:00 +08:00
Craig Topper
a403124998 [RISCV] Don't sink i1 vectors in shouldSinkOperands.
These can't create .vx instructions so there's no reason to sink them.
2023-07-06 20:36:55 -07:00
WuXinlong
6269ed24cf [RISCV] Readjusting the framestack for Zcmp
This patch readjusts the frame stack for the push and pop instructions

co-author: @Lukacma

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D134599
2023-07-07 11:24:21 +08:00
Matt Arsenault
64df9573a7 DAG: Handle inversion of fcSubnormal | fcZero
There are a number of more test combinations here that
can be done together and reduce the number of instructions.

https://reviews.llvm.org/D143191
2023-07-06 21:19:44 -04:00
Eduard Zingerman
0bf9bfeacc Revert "[BPF] Undo transformation for LICM.cpp:hoistMinMax()"
This reverts commit 09feee559a294611257ee157dba039fb05fe4f68.

Revert because of a testbot failure:
  https://lab.llvm.org/buildbot/#/builders/5/builds/34931
2023-07-07 04:01:31 +03:00
Craig Topper
be253cb987 [RISCV] Support i32 brev8 intrinsic on RV64.
Similar to what we do for orc.b. Another patch will expose this
as a builtin in clang.
2023-07-06 17:24:53 -07:00
Derek Schuff
ad14659f72 [WebAssembly] Add frexp{f,l} libcall signatures
The llvm.frexp.* family of intrinsics and their corresponding libcalls were
recently added, which means we need to know their signatures.

Differential Revision: https://reviews.llvm.org/D154639
Fixed: https://github.com/llvm/llvm-project/issues/63657
2023-07-06 13:37:11 -07:00
Matt Arsenault
61820f8b5d CodeGen: Optimize lowering of is.fpclass fcZero|fcSubnormal
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.

https://reviews.llvm.org/D143182
2023-07-06 13:03:57 -04:00
Matt Arsenault
1588e18b2d DAG: Check isCondCodeLegal in is_fpclass expansion to fcmp eq 0
Results in some x86 codegen diffs. Some look better, some look worse.

https://reviews.llvm.org/D152094
2023-07-06 13:00:52 -04:00
Matt Arsenault
9df70e4a4d AMDGPU: Fix not applying the correct default memcpy expansion threshold
Fixes 3c848194f28decca41b7362f9dd35d4939797724. The TTI hook name got
renamed at some point in the process and the target implementation was
left behind.

Fixes: SWDEV-407329
2023-07-06 12:14:14 -04:00
zhijian
d6d7f7b1d2 [AIX][XCOFF] print out the traceback info
Summary:

  Adding a new option -traceback-table to print out the traceback info of xcoff ojbect file.

Reviewers: James Henderson, Fangrui Song, Stephen Peckham, Xing Xue

Differential Revision: https://reviews.llvm.org/D89049
2023-07-06 11:47:08 -04:00
Simon Pilgrim
a69ffd6c73 [X86] isTargetShuffleEquivalent - ensure the reference operands are vector types
Fixes #63700
2023-07-06 15:38:01 +01:00
Matt Arsenault
c70cae6315 AMDGPU: Make SIFixVGPRCopies preserve everything
All this does is add uses of reserved registers, which
aren't tracked by anything. Saves a loop info computation.
2023-07-06 10:26:21 -04:00
Matt Arsenault
8ee1cc82c9 AMDGPU: Fold out sign bit ops on frexp_exp
The sign bit has no impact on the exponent, so strip these away. Saves
on the source modifier encoding cost. I left the GlobalISel handling
until there's a resolution to issue #62628.

We should do this in instcombine too, but legalization should be
introducing more frexps than it currently is where this would occur.
2023-07-06 10:26:21 -04:00
Paul Walker
90b83a6d6c [SVE] Add isel for 32-bit add/sub(cntp()) -> incp/decp.
Patterns already exist for 64-bit that I've simply copied and
converted to include the necessary truncation.

Differential Revision: https://reviews.llvm.org/D154350
2023-07-06 14:25:18 +00:00
Eduard Zingerman
09feee559a [BPF] Undo transformation for LICM.cpp:hoistMinMax()
Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation
that undoes LICM hoistMinMax pass.

The undo transformation converts the following patterns:

    x < min(a, b) -> x < a && x < b
    x > min(a, b) -> x > a || x > b
    x < max(a, b) -> x < a || x < b
    x > max(a, b) -> x > a && x > b

Where 'a' or 'b' is a constant.
Also supports `sext min(...) ...` and `zext min(...) ...`.

Differential Revision: https://reviews.llvm.org/D147990
2023-07-06 16:19:59 +03:00
Simon Pilgrim
c63be92fc8 [GlobalISel][X86] Regenerate add/sub legalization tests 2023-07-06 14:09:11 +01:00
Amy Kwan
598cccea80 [AIX][TLS] Generate optimized local-exec access code sequence using X-Form loads/stores
This patch is a follow up to D149722, D152669 and D153645, where a slightly more
optimized code sequence is generated for 64-bit and 32-bit local-exec accesses
when optimizations are turned on.

Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or
stores that follow an PPCISD::ADD_TLS can be optimized to use an X-Form load or
store. In this particular situation, this allows the ADD_TLS node to be removed
completely.

Differential Revision: https://reviews.llvm.org/D150367
2023-07-06 07:57:05 -05:00
Alex Bradbury
619c6c0e38 [RISCV][test] Add RV32I and RV64I RUN lines to llvm.frexp.ll
Thanks to D154555, these intrinsics no longer crash when used with a
soft float ABI.
2023-07-06 13:36:03 +01:00
Ivan Kosarev
b4049b409b [AMDGPU] Add GlobalISel test coverage for floating-point truncations.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154527
2023-07-06 11:37:09 +01:00
Simon Pilgrim
3f7470c33d [X86] Fold BITOP(PACKSS(X,Z),PACKSS(Y,W)) --> PACKSS(BITOP(X,Y),BITOP(Z,W)) (REAPPLIED)
Fold allsignbits pack patterns to make better use of cheap (and commutable) logic ops

Reapplied after a32d14fd4c0a / 156913cb7764 with bitcast fix
2023-07-06 10:56:07 +01:00
Simon Pilgrim
bb65e5b881 [X86] Add base SSE2 i686 test coverage to vector bitlogic reduction tests 2023-07-06 10:56:07 +01:00
Simon Pilgrim
819d070e0e [X86] Add base SSE2 i686 test coverage to vector bool reduction tests 2023-07-06 10:56:06 +01:00
Valery Pykhtin
98aa8439f5 [AMDGPU] Fix register class for a subreg in GCNRewritePartialRegUses.
1. Improved code that deduces register class from instruction definitions. Previously if some instruction didn't contain a reg class for an operand it was considered as no information on register class even if other instructions specified the class.

2. Added check on required size of resulting register because in some cases classes with smaller registers had been selected (for example VReg_1).

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D152832
2023-07-06 08:48:45 +02:00
Jianjian GUAN
a813a633d5 [RISCV][NFC] Use common prefix to simlify test.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154487
2023-07-06 11:52:51 +08:00
Craig Topper
ee34fa0032 [RISCV] Add DAG combine for (fmv_w_x_rv64 (fmv_x_anyextw_rv64 X))
This pattern started showing up more after D151284
2023-07-05 19:35:13 -07:00
Matt Arsenault
e8ed6e35bd DAG: Implement soften float for ffrexp
Fixes #63661

https://reviews.llvm.org/D154555
2023-07-05 21:42:27 -04:00
Nemanja Ivanovic
7cd9084c69 Revert "[PowerPC] Remove extend between shift and and"
This reverts commit a57236de4eb8f38b4201647b10146941cbbb5c0b.
Causes a bootstrap failure on ppc64be.
2023-07-05 20:04:49 -04:00
Arthur Eubanks
156913cb77 Revert "[X86] Fold BITOP(PACKSS(X,Z),PACKSS(Y,W)) --> PACKSS(BITOP(X,Y),BITOP(Z,W))"
This reverts commit a32d14fd4c0a43c154f251df1ccfe57e8b0a711a.

Causes crashes, see https://reviews.llvm.org/rGa32d14fd4c0a43c154f251df1ccfe57e8b0a711a.
2023-07-05 14:52:57 -07:00
Matt Arsenault
20964c901a DAG: Fix dropping flags when widening unary vector ops 2023-07-05 17:25:24 -04:00
Matt Arsenault
5491666248 AMDGPU: Correctly lower llvm.exp.f32
The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know all of these things. The library currently misses
the no-infinity check on the overflow, which this handles optimizing
out.

Some of the <3 x half> fast tests regress due to vector widening
dropping flags which will be fixed separately.

Apparently there is no exp10 intrinsic, but there should be. Adds some
deadish code in preparation for adding one while I'm following along
with the current library expansion.
2023-07-05 17:23:49 -04:00
Matt Arsenault
ed556a1ad5 AMDGPU: Correctly lower llvm.exp2.f32
Previously this did a fast math expansion only.
2023-07-05 17:23:48 -04:00
Oskar Wirga
198df5f682 Weaken MFI Max Call Frame Size Assertion
A year ago when I was not invested at all into compilers, I found an assertion error when building an AArch64 debug build with LTO + CFI, among other combinations.

It was posted as a github issue here: https://github.com/llvm/llvm-project/issues/54088

I took it upon myself to revisit the issue now that I have spent some more time working on LLVM.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D151276
2023-07-05 14:02:51 -07:00
Matt Arsenault
9c82dc6a6b AMDGPU: Always use v_rcp_f16 and v_rsq_f16
These inherited the fast math checks from f32, but the manual suggests
these should be accurate enough for unconditional use. The definition
of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've
been a bit nervous about changing this as the OpenCL conformance test
does not cover half. Brute force produces identical values compared to
a reference host implementation for all values.
2023-07-05 16:53:01 -04:00
Matt Arsenault
59c311c5d4 AMDGPU: Add more tests for f16 fdiv lowering
Probably should merge the DAG and gisel tests.
2023-07-05 16:53:01 -04:00
Nemanja Ivanovic
a57236de4e [PowerPC] Remove extend between shift and and
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.

Differential revision: https://reviews.llvm.org/D152911
2023-07-05 16:33:07 -04:00
Amaury Séchet
872276de4b [NFC] Autogenerate CodeGen/SystemZ/int-{uadd,sub}-0*.ll 2023-07-05 20:14:43 +00:00
Philip Reames
403261eafd [RISCV] Remove legacy TA/TU pseudo distinction for load instructions
This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.

This change targets all the pseudos used in loads (unit, strided, segmented, fault first, and their combinations). As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand.

One quirk is that I went ahead and treated the unmasked mask load instruction (vlm) the same way. We need the pass thru operand to model tail undefined, but since the instruction is unconditionally agnostic and the instruction has no mask, the policy operand is arguably unneeded. I kept it mostly for consistency sake.

Another quirk worth highlighting is that segment loads require a bit of dedicated handling. Surprisingly, we don't have IMPLICIT_DEF nodes of the right types, and attempting to use them results in some odd looking codegen and a few crashes. Instead, I left the REG_SEQUENCE form, and extended InsertVSETVLI to recognize the complex undefs. Arguably, we should probably revisit the handling of undef reg_sequence nodes here, but I'm hoping to side step that in this patch.

As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. I did have to delete one register allocation regression test as I couldn't figure out how to meaningfully update it. I spent a significant amount of time trying, and finally gave up.

Differential Revision: https://reviews.llvm.org/D154141
2023-07-05 13:11:58 -07:00
Matt Arsenault
4e15f378ee AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32
Previously we expanded these in a fast-math way and the device
libraries were relying on this behavior. The libraries have a pending
change to switch to the new target intrinsic.

Unlike the library version, this takes advantage of no-infinities on
the result overflow check.
2023-07-05 15:30:35 -04:00
Luke Lau
1039aec30b [RISCV] Fix interleave/deinterleave store test output
Looks like the output changed after rebasing
2023-07-05 19:52:50 +01:00
Luke Lau
ea62fc79e7 [RISCV] Lower deinterleave2 intrinsics to vlseg2
Following from D153864, this patch implements the lowerDeinterleaveIntrinsic
hook to lower deinterleaves of loads into vlseg2 intrinsics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153876
2023-07-05 19:24:15 +01:00
Luke Lau
86a9bbfdb3 [RISCV] Add tests for vector.deinterleave2s of loads
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D153875
2023-07-05 19:24:10 +01:00
Luke Lau
70093fcf6c [RISCV] Lower interleave2 intrinsics to vsseg2
This patch teaches the RISCV TargetLowering class to lower interleave
intrinsics to vsseg2, so it can lower interleaved stores for scalable vectors.
Previously, we could only lower stores of interleaves for fixed length vectors
with vector shuffles.

This uses the lowerInterleaveIntrinsic interface for the interleaved
access pass that was added in D146218, and subsumes the DAG combine
approach taken in D144175

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D153864
2023-07-05 19:24:05 +01:00
Luke Lau
d914686da2 [RISCV] Add tests for stores of vector.interleave2
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153863
2023-07-05 19:24:01 +01:00
Yusra Syeda
163aad6bcb [SystemZ][z/OS] z/OS ADA codegen and emission
This patch adds support for the ADA (associated data area), doing the following:

-Creates the ADA table to handle displacements
-Emits the ADA section in the SystemZAsmPrinter
-Lowers the ADA_ENTRY node into the appropriate load instruction

Differential Revision: https://reviews.llvm.org/D153788
2023-07-05 13:21:52 -04:00
Igor Kirillov
7f20407cee [CodeGen] Add support for Splats in ComplexDeinterleaving pass
This commit allows generating of complex number intrinsics for expressions
with constants or loops invariants, which are represented as splats.
For instance, after vectorizing loops in the following code snippets,
the ComplexDeinterleaving pass will be able to generate complex number
intrinsics:

```
complex<> x = ...;
for (int i = 0; i < N; ++i)
    c[i] = a[i] * b[i] * x;
```

or

```
for (int i = 0; i < N; ++i)
    c[i] = a[i] * b[i] * (11.0 + 3.0i);
```

Differential Revision: https://reviews.llvm.org/D153355
2023-07-05 17:02:52 +00:00