52796 Commits

Author SHA1 Message Date
Luke Lau
74f985b793
[RISCV] Remove -riscv-v-vector-bits-min in tests. NFC (#65404)
V implies Zvl128b, but a lot of the fixed vector tests also redundantly
specify -riscv-v-vector-bits-min=128. This patch removes them where
there isn't another minimum vlen being tested for, and for cases where
Zve* is being used Zvl128b was added to maintain the old test diff (and
because an awkward vlen probably isn't interesting to test for). Other
places where -risc-v-vector-bits-min were being used were replaced with
Zvl.
2023-09-06 10:43:41 +01:00
Dmitri Gribenko
97bf104d97 Revert "[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)"
This reverts commit b027ce0ab93060bc6cb79d5402d21520e8b93fb7.

This commit breaks Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll.
2023-09-06 11:28:55 +02:00
Simon Pilgrim
b027ce0ab9 [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)
Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.

This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well.

Alive2: https://alive2.llvm.org/ce/z/2UpSbJ

Differential Revision: https://reviews.llvm.org/D159198
2023-09-06 10:06:21 +01:00
Kito Cheng
af9b25f9db [RISCV] Optimize floating point scalar move and splat
In D158086, we limit all floating point scalar move and splat can't fuse
vsetvli with different SEW, and this patch try to relax the constraint
as possible by introducing new SEW demand type:
SEWGreaterThanOrEqualAndLessThan64, that allow SEW fused with larger
SEW, but constraint it can't fused with SEW=64.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D158177
2023-09-06 16:39:30 +08:00
laichunfeng
71b5f57f0d [RISCV] Adjust first sp size to use c.addi16sp.
addi sp, sp, 512 may be used to recover the sp in the epilogue
when stack size is larger than 2047(2^11 - 1), however, it can
not be compressed using C extension, and addi sp, sp, 496 is
able to be compressed, so try to use 496 as the ajust amount of
the fisrt sp if function doesn't need extra instructions after
adjust.

Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D159431
2023-09-06 14:26:52 +08:00
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Pravin Jagtap
b230472f22
[AMDGPU] Extend v2i16 & v2f16 support for llvm.amdgcn.update.dpp intr (#65318)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2023-09-06 10:20:34 +05:30
Craig Topper
2a7b8ab07c
[RISCV] Use add.uw for (or (and X, 0xFFFFFFFF), Y) if Y has zeroes in the lower 32 bits. (#65402) 2023-09-05 21:05:53 -07:00
Amara Emerson
6c31f20fee
[GlobalISel] Fold fmul x, 1.0 -> x (#65379) 2023-09-06 03:14:16 +08:00
Amy Kwan
f0b2f69541 [AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses.
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.

This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.

Differential Revision: https://reviews.llvm.org/D151335
2023-09-05 12:15:14 -05:00
Simon Pilgrim
e086e0aeef [X86] Add test coverage for new smulo folds added in D159406
Pulled from the InstCombine with_overflow.ll tests
2023-09-05 17:43:42 +01:00
Philip Reames
de34d39b66 [RISCV] Cap build vector cost to avoid quadratic cost at high LMULs
Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL).  Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL.  To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.

For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
  buildvector_m1_via_stack.mca:Total Cycles: 1904
  buildvector_m2_via_stack.mca:Total Cycles: 2104
  buildvector_m4_via_stack.mca:Total Cycles: 2504
  buildvector_m8_via_stack.mca:Total Cycles: 3304
  buildvector_m1_via_vslide1down.mca:Total Cycles:  804
  buildvector_m2_via_vslide1down.mca:Total Cycles:  1604
  buildvector_m4_via_vslide1down.mca:Total Cycles:  6400
  buildvector_m8_via_vslide1down.mca:Total Cycles: 25599

There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.

Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.

Differential Revision: https://reviews.llvm.org/D159332
2023-09-05 09:03:26 -07:00
Craig Topper
fa31ce5320
[RISCV][GISel] Add gisel-commandline-option.ll similar to AArch64. (#65299)
This allows us to see the pass pipeline for GlobalISel.
2023-09-05 09:01:50 -07:00
Amara Emerson
08e04209d8
[GlobalISel] Commute G_FMUL and G_FADD constant LHS to RHS. (#65298) 2023-09-05 23:48:34 +08:00
Luke Lau
2fc6fadeaf [RISCV] Fix typo in test title. NFC 2023-09-05 15:57:18 +01:00
Vladislav Dzhidzhoev
13b7629a58 [GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v).
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.

It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
2023-09-05 16:14:44 +02:00
Vladislav Dzhidzhoev
7eeeeb0cc9 Revert "[GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v)."
This reverts commit 6b37a65264bb4e7d400d5283a65f9e8e1575f2d7.
Accindentally pushed before squashing.
2023-09-05 16:13:27 +02:00
Vladislav Dzhidzhoev
0e826f0e6d Refactored, added MIR test. 2023-09-05 16:00:48 +02:00
Vladislav Dzhidzhoev
6b37a65264 [GlobalISel][AArch64] Combine unmerge(G_EXT v, undef) to unmerge(v).
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.

It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
2023-09-05 16:00:48 +02:00
Jingu Kang
67fc0d3d39 [AArch64] Remove copy instruction between uaddlv and dup
If there are copy instructions between uaddlv and dup for transfer from gpr to
fpr, try to remove them with duplane.

Differential Revision: https://reviews.llvm.org/D159267
2023-09-05 14:41:28 +01:00
David Sherwood
50598f0ff4 [DAGCombiner][SVE] Add support for illegal extending masked loads
In some cases where the same mask is used for multiple
extending masked loads it can be more efficient to combine
the zero- or sign-extend into the load even if it's not a
legal or custom operation. This leads to splitting up the
extending load into smaller parts, which also requires
splitting the mask. For SVE at least this improves the
performance of the SPEC benchmark x264 slightly on
neoverse-v1 (~0.3%), and at least one other benchmark
improves by around 30%. The uplift for SVE seems due to
removing the dependencies (vector unpacks) introduced
between the loads and the vector operations, since this
should increase the level of parallelism.

See tests:

  CodeGen/AArch64/sve-masked-ldst-sext.ll
  CodeGen/AArch64/sve-masked-ldst-zext.ll

https://reviews.llvm.org/D159191
2023-09-05 10:41:21 +00:00
David Sherwood
64094e3e6d [DAGCombiner] Pre-commit tests for D159191
I've added some missing tests for the following cases:

1. Zero- and sign-extends from unpacked vector types to wide,
   illegal types. For example,
   %aext = zext <vscale x 4 x i8> %a to <vscale x 4 x i64>
2. Normal loads combined with 1
3. Masked loads combined with 1

Differential Revision: https://reviews.llvm.org/D159192
2023-09-05 10:41:21 +00:00
Amara Emerson
12e4921709
[GlobalISel] Constant fold sitofp/uitofp of 0. (#65307) 2023-09-05 17:33:57 +08:00
pvanhout
844c0da777 [TableGen][GlobalISel] Add MIR Pattern Builtins
Adds a new feature to MIR patterns: builtin instructions.
They offer some additional capabilities that currently cannot be expressed without falling back to C++ code.
There are two builtins added with this patch, but more can be added later as new needs arise:
 - GIReplaceReg
 - GIEraseRoot

Depends on D158714, D158713

Reviewed By: arsenm, aemerson

Differential Revision: https://reviews.llvm.org/D158975
2023-09-05 08:19:07 +02:00
Qiu Chaofan
082c5d7f63 [PowerPC] Implement builtin for mffsl
mffsl is available since ISA 3.0. The builtin is named with ppc prefix
to follow our convention. For targets earlier than power9, GCC generates
extra code to support the functionality, while this patch does not
implement such behavior.

Reviewed By: nemanjai, tuliom

Differential Revision: https://reviews.llvm.org/D158065
2023-09-05 11:22:09 +08:00
Nicolai Hähnle
62790a8d4a AMDGPU: Fix test from previous commit 2023-09-05 00:31:49 +02:00
Nicolai Hähnle
f5fb6ad2e5 AMDGPU: Precommit a test file
Demonstrates bad scheduling for private load/store vs. buffer
intrinsics.
2023-09-05 00:17:46 +02:00
Amara Emerson
91746d15d2
[GlobalISel] Fix G_PTR_ADD immediate chain combine using the wrong im… (#65271) 2023-09-05 08:06:40 +08:00
Jay Foad
71ca53b6cf
[GlobalISel] Lower G_SHUFFLE_VECTOR with scalar result (#65275) 2023-09-04 13:32:43 -04:00
Simon Pilgrim
e6971cbc06 [X86] combine-mulo.ll - add common CHECK prefix for SSE/AVX test runs 2023-09-04 16:42:48 +01:00
Amara Emerson
f51b7992c9 [GlobalISel] Precommit a ptradd combine test. 2023-09-04 08:27:20 -07:00
Vladislav Dzhidzhoev
a15144f2ba [AArch64][GlobalISel] Lower G_EXTRACT_VECTOR_ELT with variable indices
G_EXTRACT_VECTOR_ELT instructions with non-constant indices are not
selected, so they need to be lowered.

Fixes https://github.com/llvm/llvm-project/issues/65049.

Reviewed By: Peter

Differential Revision: https://reviews.llvm.org/D159096
2023-09-04 16:19:16 +02:00
sdesmalen-arm
dbf9b93f25
[AArch64][SME] Disable tail-call optimization for __arm_locally_streaming functions. (#65258)
When calling a function which requires no streaming-mode change from an
__arm_locally_streaming function, LLVM would otherwise emit:

  // function prologue
  smstart
  b streaming_compatible_function   // tail call
  // never an smstop
2023-09-04 15:11:22 +01:00
John Brawn
fae3f9ec4f [ARM] Fix prologue/epilogue for pacbti-m leaf functions
R12 is callee-saved in functions with pacbti-m enabled, but this is
done in assignCalleeSavedSpillSlots, meaning that in
determineCalleeSaves we have to manually set CanEliminateFrame.

This fixes a bug where in leaf functions with no other callee-saved
registers the aut instruction wouldn't be emitted and stack offsets
of arguments passed on the stack would be incorrect.

Differential Revision: https://reviews.llvm.org/D157865
2023-09-04 13:46:01 +01:00
Sander de Smalen
702c3f56d3 [SME] Don't scavenge a spillslot in callee-save area in presence of streaming-mode changes.
If no frame-pointer is available and the compiler has scavenged a
spill-slot in the callee-save area, the compiler may be forced to emit an
'addvl' inside the streaming-mode-changing call sequence when it needs to
fill (reload) an FP register being passed to the call.

We can avoid this entirely by disabling stack-slot scavenging when there
are streaming-mode-changing call-sequences in the function.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D159196
2023-09-04 10:14:44 +00:00
Luke Lau
6098d7d5f6 [RISCV] Lower shuffles as rotates without zvbb
Now that the codegen for the expanded ISD::ROTL sequence has been improved,
it's probably profitable to lower a shuffle that's a rotate to the
vsll+vsrl+vor sequence to avoid a vrgather where possible, even if we don't
have the vror instruction.

This patch relaxes the restriction on ISD::ROTL being legal in
lowerVECTOR_SHUFFLEAsRotate. It also attempts to do the lowering twice: Once
if zvbb is enabled before any of the interleave/deinterleave/vmerge lowerings,
and a second time unconditionally just before it falls back to the vrgather.
This way it doesn't interfere with any of the above patterns that may be more
profitable than the expanded ISD::ROTL sequence.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159353
2023-09-04 09:35:12 +01:00
Amara Emerson
0065640f40 [GlobalISel] Look through a G_PTR_ADD's def register instead of it's source operand's
uses when looking for load/store users. This was a simple logic bug during translation
of the equivalent function in SelectionDAG:
```
    for (SDNode *Node : N->uses()) {
      if (auto *LoadStore = dyn_cast<MemSDNode>(Node)) {
```
2023-09-04 00:28:57 -07:00
Amara Emerson
59cbee4599 [GlobalISel] Fix an incorrect ptradd reassoc test. NFC.
The lookthrough int<->ptr cast tests and code were both wrongly checking the wrong
register uses. This change is fixing and precommiting the test to prepare for
the code fix.
2023-09-04 00:28:56 -07:00
Amara Emerson
69d8ca21af [GlobalISel] Regenerate ptradd reassociation tests checks. 2023-09-04 00:03:38 -07:00
Matt Arsenault
65b40f273f RegAlloc: Rename MLRegalloc* files to use consistent captalization
The other regalloc related files use RegAlloc, not Regalloc.
2023-09-03 09:00:27 -04:00
Simon Pilgrim
d9ffd3219e [X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero (REAPPLIED)
We can just bitcast the pre-shifted mask as an integer and use TEST/BT directly.

Reapplied with fix for 239ab16ec121 which didn't set the comparison type correctly
2023-09-02 17:45:17 +01:00
Simon Pilgrim
600b4634ac [X86] Add test to check that an extracted bool element comparison is correctly extended when the bool vector is bitcast instead
Thanks to @zequanwu for the reduced test case where 239ab16ec121 failed to correctly cast a compare-with-zero to the correct integer type
2023-09-02 17:34:12 +01:00
Matt Arsenault
1f52060000 AMDGPU: Use poison instead of undef in module lds pass 2023-09-02 11:33:26 -04:00
Xiang Li
c21cd168bb
[DirectX backend] avoid generate redundant bitcast in DXILPrepareModule (#65163)
When emit NoOp bitcast for GEP Ptr Operand, should use SourceElementType
instead of ResultElementType.

**Behavior Before Change**
Redundant bitcast like 
   ` bitcast ptr addrspace(3) @gs to ptr addrspace(3)`
 will be generated for llvm/test/CodeGen/DirectX/typed_ptr.ll

**Behavior After Change**
  No bitcast will be generated.

Fixes https://github.com/llvm/llvm-project/issues/65183
2023-09-01 20:08:39 -04:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Zequan Wu
57595086db fix revert b0b3f82dd3c00cdba891f1ff6ba63abd419d0f18 2023-09-01 17:01:43 -04:00
Zequan Wu
b0b3f82dd3 Revert "[X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero"
This reverts commit 239ab16ec1213749a2228368298519b377d336bb.

This causes crashes when compiling chromium with asan, attached reduced ir at: https://reviews.llvm.org/rG239ab16ec1213749a2228368298519b377d336bb#124559
2023-09-01 16:35:55 -04:00
Philip Reames
8347d7c152 Revert "Revert "[X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero""
This reverts commit 460bba35211853b6278ddd6064f7228db02da132.  Change does not pass check-llvm.
2023-09-01 12:17:36 -07:00
Zequan Wu
460bba3521 Revert "[X86] combineCMP - attempt to simplify KSHIFTR mask element extractions when just comparing against zero"
This reverts commit 239ab16ec1213749a2228368298519b377d336bb.

This causes crashes when compiling chromium with asan, attached reduced ir at: https://reviews.llvm.org/rG239ab16ec1213749a2228368298519b377d336bb#1245595
2023-09-01 15:06:04 -04:00
Philip Reames
7c4f455992 {RISCV] Add test coverage for fully scalarizing a vector
This pattern comes up heavily when partially vectorizing a forrest in SLP.
2023-09-01 11:17:42 -07:00