52796 Commits

Author SHA1 Message Date
Sean Fertile
cef56b9318 Revert "[XCOFF][AIX] Peephole optimization for toc-data."
This reverts commit 5e28d30f1fb10faf2db2f8bf0502e7fd72e6ac2e.
2023-08-15 10:40:35 -04:00
Sean Fertile
ce658829c9 Revert "[PPC][AIX] Fix toc-data peephole bug and some related cleanup."
This reverts commit b37c7ed0c95c7f24758b1532f04275b4bb65d3c1.
2023-08-15 10:40:35 -04:00
Jay Foad
fdbc944385 Fix typos in comments 2023-08-15 13:57:21 +01:00
Jingu Kang
9f8dcb0706 [AArch64] Try to detect patterns with fdiv and fmul for [su]cvtf.
If fmul's constant operand is the reciprocal of a power of 2 (i.e 1/2^n) or
fdiv's constant operand is power of 2, we can try to match patterns with
[su]int_to_fp for [su]cvtf.

Differential Revision: https://reviews.llvm.org/D156538
2023-08-15 10:57:07 +01:00
Jay Foad
f0e5f73fdc [MachineScheduler] Account for lane masks in basic block liveins
Differential Revision: https://reviews.llvm.org/D157633
2023-08-15 09:52:43 +01:00
wangpc
ac00cca3d9 [RISCV] Fix assertion when passing f64 vectors via integer registers
The vector arguments are split but assignments won't be pending.

Fixes #64645

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157847
2023-08-15 12:11:08 +08:00
wangpc
61ab106f82 [RISCV] Add tune features of preferred function/loop align
D144048 has added preferred function and loop alignment to
RISCVSubtarget, but now we need to set them manually for
different processors.

Tune features that set preferred function/loop align to
[2, 64] bytes (align 1 is not here since the min align is 2)
are added. These features can be used in processor
definitions.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157832
2023-08-15 12:04:12 +08:00
Eduard Zingerman
08d92dedd2 [BPF] Fix in/out argument constraints for CORE_MEM instructions
When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the
following C code snippet:

    struct t {
      int a;
    } __attribute__((preserve_access_index));

    void test(struct t *t) {
      t->a = 42;
    }

Causes an assertion:

$ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null

Function Live Ins: $r1 in %0

bb.0.entry:
  liveins: $r1
  DBG_VALUE $r1, $noreg, !"t", ...
  %0:gpr = COPY $r1
  DBG_VALUE %0:gpr, $noreg, !"t", ...
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  %4:gpr = MOV_ri 42
  CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...
  RET debug-location !25; t.c:7:1

*** Bad machine code: Explicit definition marked as use ***
- function:    test
- basic block: %bb.0 entry (0x6210000d8a90)
- instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...
- operand 0:   killed %4:gpr

This happens because `CORE_MEM` instruction is defined to have output
operands:

  def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value,
                            (outs GPR:$dst),
                            (ins u64imm:$opcode, GPR:$src, u64imm:$offset),
                            "$dst = core_mem($opcode, $src, $offset)",
                            []>;

As documented in [1]:

> By convention, the LLVM code generator orders instruction operands
> so that all register definitions come before the register uses, even
> on architectures that are normally printed in other orders.

In other words, the first argument for `CORE_MEM` is considered to be
a "def", while in reality it is "use":

  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  %4:gpr = MOV_ri 42
   '---------------.
                   v
  CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ...

Here is how `CORE_MEM` is constructed in
`BPFMISimplifyPatchable::checkADDrr()`:

    BuildMI(*DefInst->getParent(), *DefInst, DefInst->getDebugLoc(), TII->get(COREOp))
        .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp)
        .addGlobalAddress(GVal);

Note that first operand is constructed as `.add(DefInst->getOperand(0))`.

For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a
destination register of a load, so instruction is constructed in
accordance with `outs` declaration.

For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a
source register of a store (value to be stored), so instruction
violates the `outs` declaration.

This commit fixes the issue by splitting `CORE_MEM` in three
instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs`
specifications.

[1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class

Differential Revision: https://reviews.llvm.org/D157806
2023-08-15 02:34:21 +03:00
Eduard Zingerman
27026fe563 [BPF] Reset machine register kill mark in BPFMISimplifyPatchable
When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option
the following C code snippet:

    struct t {
      unsigned long a;
    } __attribute__((preserve_access_index));

    void foo(volatile struct t *t, volatile unsigned long *p) {
      *p = t->a;
      *p = t->a;
    }

Causes an assertion:

    $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null

    # After BPF PreEmit SimplifyPatchable
    # Machine code for function foo: IsSSA, TracksLiveness
    Function Live Ins: $r1 in %0, $r2 in %1

    bb.0.entry:
      liveins: $r1, $r2
      DBG_VALUE $r1, $noreg, !"t", !DIExpression()
      DBG_VALUE $r2, $noreg, !"p", !DIExpression()
      %1:gpr = COPY $r2
      DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression()
      %0:gpr = COPY $r1
      DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression()
      %2:gpr = LD_imm64 @"llvm.t:0:0$0:0"
      %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
      %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0"
      STD killed %5:gpr, %1:gpr, 0
      %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
      %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0"
      STD killed %8:gpr, %1:gpr, 0
      RET

    # End machine code for function foo.

    *** Bad machine code: Using a killed virtual register ***
    - function:    foo
    - basic block: %bb.0 entry (0x6210000e6690)
    - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
    - operand 2:   killed %2:gpr

This happens because of the way
BPFMISimplifyPatchable::processDstReg() updates second operand of the
`ADD_rr` instruction. Code before `BPFMISimplifyPatchable`:

    .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0"
    |
    |`----------------.
    |   %3:gpr = LDD %2:gpr, 0
    |   %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1)
    |   %5:gpr = LDD killed %4:gpr, 0       ^^^^^^^^^^^^^
    |   STD killed %5:gpr, %1:gpr, 0        this is updated
     `----------------.
        %6:gpr = LDD %2:gpr, 0
        %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2)
        %8:gpr = LDD killed %7:gpr, 0       ^^^^^^^^^^^^^
        STD killed %8:gpr, %1:gpr, 0        this is updated

Instructions (1) and (2) would be updated to:

    ADD_rr %0:gpr(tied-def 0), killed %2:gpr

The `killed` mark is inherited from machine operands `killed %3:gpr`
and `killed %6:gpr` which are updated inplace by `processDstReg()`.

This commit updates `processDstReg()` reset kill marks for updated
machine operands to keep liveness information conservatively correct.

Differential Revision: https://reviews.llvm.org/D157805
2023-08-15 02:23:38 +03:00
Anmol P. Paralkar
53e89f5e3f [RISCV] Add bounds check before use on returned iterator.
Check iterator validity before use; fixes a crash seen in the RISC-V
Zcmp Push/Pop optimization pass when compiling an internal benchmark.

Reviewed By: asb, wangpc

Differential Revision: https://reviews.llvm.org/D157674
2023-08-14 16:06:09 -07:00
Matt Arsenault
1faa4797ca AMDGPU: Handle unsafe exp.f32 with denormal handling
I somehow missed this path when adding the new expansions. Saves a lot
of instructions for afn + IEEE.

https://reviews.llvm.org/D157867
2023-08-14 18:36:01 -04:00
Matt Arsenault
d45022b094 AMDGPU: Remove special case constant folding of divide
We should probably just swap this out for the fdiv, but that's what
the implementation is anyway.
2023-08-14 18:36:01 -04:00
Matt Arsenault
0eabe65bfb AMDGPU: Replace ldexp libcalls with intrinsic 2023-08-14 18:36:01 -04:00
Matt Arsenault
f337a77c99 AMDGPU: Replace rounding libcalls with intrinsics 2023-08-14 18:36:01 -04:00
Matt Arsenault
c7876c55ac AMDGPU: Replace fabs and copysign libcalls with intrinsics
Preserves flags and metadata like the other cases.
2023-08-14 18:28:21 -04:00
Matt Arsenault
a70006c4c5 AMDGPU: Replace some libcalls with intrinsics
OpenCL loses fast math information by going through libcall wrappers
around intrinsics.

Do this to preserve call site flags which are lost when inlining. It's
not safe in general to propagate flags during inline, so avoid dealing
with this by just special casing some of the useful calls.
2023-08-14 18:20:47 -04:00
Craig Topper
9cf375b310 [RISCV][GISel] Narrow G_SEXT_INREG to XLenLLT before lowering.
If we lower, we need to legalize the wide shifts which
is costly.

This will improve the tests from https://reviews.llvm.org/D157415 too

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157677
2023-08-14 14:49:00 -07:00
Craig Topper
6299650f97 [DAGCombiner] Fold trunc(undef) -> undef.
We already do this in getNode, but the undef might appear during
another DAGCombine.

While here remove code for handling noop truncates. getNode checks
the types and won't a noop truncate.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157910
2023-08-14 13:02:24 -07:00
Philip Reames
a63bd7e99b [RISCV] Use NoReg in place of IMPLICIT_DEF for undefined passthru operands
In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.

This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.

We may end up backporting this into the 17.x release branch.  Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.

Differential Revision: https://reviews.llvm.org/D156909
2023-08-14 12:57:38 -07:00
Joe Nash
dc242f9f1e [AMDGPU][NFC] Convert fpto{u|s}i f16 tests to auto-gen
Makes it easier to add GFX11 runline in future patch, which has
significantly different output.
2023-08-14 15:28:20 -04:00
Matt Arsenault
a8376bbe53 AMDGPU: Add baseline tests for libcall to intrinsic handling
Test all the different itanium mangled opencl functions that are
interesting to replace with raw intrinsic calls.

https://reviews.llvm.org/D157873
2023-08-14 15:15:30 -04:00
Craig Topper
e4b2f2d4a6 [RISCV][GISel] Legalize G_PHI and G_BRCOND.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D157818
2023-08-14 10:21:58 -07:00
Craig Topper
1fa858d987 [RISCV][GISel] Make G_CONSTANT of pointers legal.
This is needed to support things like null pointers.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D157822
2023-08-14 10:11:29 -07:00
David Green
3f8210921e [AArch64] Add FeatureFuseAdrpAdd for NeoverseV2
As in all the other cpus from D134521, this adds FeatureFuseAdrpAdd to
NeoverseV2 to allow more linker relaxations.
2023-08-14 17:21:25 +01:00
Matt Arsenault
f44beecb78 AMDGPU: Try to use private version of sincos if available
The comment was out of date, the device libs build does provide all
the pointer overloads. An extremely pedantic interpretation of the
spec would suggest only the flat version exists, but the overloads do
exist in the implementation.

https://reviews.llvm.org/D156720
2023-08-14 11:40:04 -04:00
Luke Lau
9f369a4c43 [RISCV] Lower reverse shuffles of fixed i1 vectors to vbrev.v
If we can fit an entire vector of i1 into a single element, e.g. v32i1 ->
v1i32, then we can reverse it via vbrev.v.
We need to handle the case where the vector doesn't exactly fit into the larger
element type, e.g. v4i1 -> v1i8. In this case we shift up the reversed bits
afterwards.

Reviewed By: fakepaper56, 4vtomat

Differential Revision: https://reviews.llvm.org/D157614
2023-08-14 16:36:58 +01:00
Matt Arsenault
58fd1de09f AMDGPU: Consider nobuiltin when querying defined libfuncs
https://reviews.llvm.org/D156708
2023-08-14 11:30:12 -04:00
Matt Arsenault
42c6e4209c AMDGPU: Handle multiple uses when matching sincos
Match how the generic implementation handles this. We now will leave
behind the dead other user for later passes to deal with.

https://reviews.llvm.org/D156707
2023-08-14 11:28:41 -04:00
Dinar Temirbulatov
f598b616e0 [AArch64][SME] Non-streaming compatible SCVTF emitted with --force-streaming-compatible-sve
For scalar integer to float converts for Streaming Compatible SVE use
non-NEON version of convert instrction.

Differential Revision: https://reviews.llvm.org/D157698
2023-08-14 13:49:57 +00:00
David Green
a3f2751f78 [AArch64][GISel] Add handling for G_VECREDUCE_FMAXIMUM and G_VECREDUCE_FMINIMUM
This is a lot of copy-pasting for the existing handling of
G_VECREDUCE_FMAX/G_VECREDUCE_FMIN to add handling for
G_VECREDUCE_FMAXIMUM/G_VECREDUCE_FMINIMUM in the same way.

Differential Revision: https://reviews.llvm.org/D156615
2023-08-14 10:03:25 +01:00
Luke Lau
6238b8ea63 [LegalizeTypes] Factor in vscale_range when widening insert_subvector
Currently when widening operands for insert_subvector nodes, we check
first that the indices are valid by seeing if the subvector is
statically known to be smaller than or equal to the in-place vector.

However if we're inserting a fixed subvector into a scalable vector we rely on
the minimum vector length of the latter. This patch extends the widening logic
to also take into account the minimum vscale from the vscale_range attribute,
so we can handle more scenarios where we know the scalable vector is large
enough to contain the subvector.

Fixes https://github.com/llvm/llvm-project/issues/63437

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153519
2023-08-14 09:58:15 +01:00
David Green
d199478af4 [AArch64][GISel] Handling for G_VECREDUCE_FMIN and G_VECREDUCE_FMAX
This adds legalization for G_VECREDUCE_FMIN and G_VECREDUCE_FMAX, where the
selection can go via tablegen patterns. I haven't tried to get non-power2 types
working yet, just the more legal types.

Differential Revision: https://reviews.llvm.org/D156614
2023-08-14 09:19:47 +01:00
Nikita Popov
9deee6bffa [SDAG] Don't transfer !range metadata without !noundef to SDAG (PR64589)
D141386 changed the semantics of !range metadata to return poison
on violation. If !range is combined with !noundef, violation is
immediate UB instead, matching the old semantics.

In theory, these IR semantics should also carry over into SDAG.
In practice, DAGCombine has at least one key transform that is
invalid in the presence of poison, namely the conversion of logical
and/or to bitwise and/or (c7b537bf09/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L11252)).
Ideally, we would fix this transform, but this will require
substantial work to avoid codegen regressions.

In the meantime, avoid transferring !range metadata without
!noundef, effectively restoring the old !range metadata semantics
on the SDAG layer.

Fixes https://github.com/llvm/llvm-project/issues/64589.

Differential Revision: https://reviews.llvm.org/D157685
2023-08-14 09:04:27 +02:00
LWenH
555e0305fd [RISCV] Match ext + ext + srem + trunc to vrem.vv
This patch match the SDNode pattern:" trunc (srem(sext, ext))" to vrem.vv. This could remove the extra "vsext" ,"vnsrl" and the "vsetvli" instructions in the case like "c[i] = a[i] % b[i]", where the element types in the array are all int8_t or int16_t at the same time.

For element types like uint8_t or uint16_t, the "zext + zext + urem + trunc" based redundant IR have been removed during the instCombine pass, this is because the urem operation won't lead to the overflowed in the LLVM.  However, for signed types, the instCombine pass can not remove such patterns due to the potential for Undefined Behavior in LLVM IR. Taking an example, -128 % -1 will lead to the Undefined Behaviour(overflowed) under the i8 type in LLVM IR, but this situation doesn't occur for i32.  To address this,  LLVM first signed extends the operands for srem to i32 to prevent the UB.

For RVV,  such overflow operations are already defined by the specification and yield deterministic output for extreme inputs. For example, based on the spec, for the i8 type, -128 % -1 actually have 0 as the output result under the overflowed situation. Therefore, it would be able to match such pattern in the instruction selection phase for the rvv backend rather than removing them in the target-independent optimization passes like instCombine pass.

This patch only handle the sign_ext circumstances for srem.  For more information about the C test cases compared with  GCC, please see : https://gcc.godbolt.org/z/MWzE7WaT4

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156685
2023-08-13 22:14:43 -07:00
LWenH
6cb55a3d9a [RISCV] Add Precommit test for D156685
Add baseline test for [[ https://reviews.llvm.org/D156685 | D156685 ]].

In LLVM, such signed 8 bits reaminder operation will first signed extened the operands to 32 bits, and then narrow the operands to the smaller bits data type such as 16 bits during the CorrelatedValuePropagation Pass to optimize the final data storage size.

Such a signed extension operation for srem in LLVM system is to prevent the Undefined Behavior.  Taking an example, -128 % -1 will lead to the Undefined Behaviour under the i8 type in LLVM IR, but this won't happen for i32, so such pattern cannot be eliminated in the platform-independent InstCombine Pass. The LLVM IR of these sext/trunc operations will be translated one by one during the RVV backend code generation process, and redundant vsetvli instructions will be inserted.

In fact, according to the RVV instruction manual, the vrem.vv instruction has already specified the final output value of this type of overflow operation. For example, the overflow operation of -128 % -1 will get 0 according to the RISC-V spec, so through this patch , I think we can optimize these redundant rvv code through the SDNode pattern match at the instruction selection phase.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157592
2023-08-13 22:14:38 -07:00
wangpc
8a98f24ec5 [RISCV] Truncate constants to EltSize when combine store of BUILD_VECTOR
The constants can be with larger bit width, so we need to truncate
them to EltSize or we will exceed the width of fixed-length vector.

Fixes #64588

Reviewed By: luke, craig.topper, bjope, michaelmaitland

Differential Revision: https://reviews.llvm.org/D157603
2023-08-14 10:55:53 +08:00
Shengchen Kan
fda9a9c61e [X86][Codegen] Remove dead code for ADCX/ADOX
There is no pattern for ADCX/ADOX and they are never selected during
ISEL. So we remove the cases in some MIR optimizations in this patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157717
2023-08-14 10:23:42 +08:00
Karl-Johan Johnsson
917574d5d8 [MachineLICM][WinEH] Don't hoist register reloads out of funclets
This fixes https://github.com/llvm/llvm-project/issues/60766

With MSVC style exception-handling (funclets), no registers are
alive when entering the funclet so they must be reloaded from the
stack.  MachineLICM can sometimes hoist such reloads out of the
funclet which is not correct, the register will have been clobbered
when entering the funclet.  This can happen in any loop that
contains a try-catch.

This has been tested on x86_64-pc-window-msvc.  I'm not sure if
funclets work the same on the other windows archs.

Reviewed By: rnk, arsenm

Differential Revision: https://reviews.llvm.org/D153337
2023-08-13 23:58:16 +03:00
Simon Pilgrim
512a6c50e8 [X86] combineToExtendBoolVectorInReg - don't use changeVectorElementType to create the bool vector type
Converting a (simple) vXf32 type to a vXi1 type isn't guaranteed to be simple, causing the MVT type to be invalid.

Fixes #64627
2023-08-13 11:18:59 +01:00
Christudasan Devadasan
bd7c6e3c48 [AMDGPU] Precommit lit test for wwm-reg AV spill pseudos D155646. 2023-08-12 16:18:18 +05:30
Philip Reames
84a2b55b0d [RISCV] Add test coverage for matching strided loads with negative offsets 2023-08-11 15:27:01 -07:00
Konrad Kusiak
4fa8a5487e [AMDGPU] Add sanity check that fixes bad shift operation in AMD backend
There is a problem with the
SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to
UB.

This boolean function decides if two masks can be combined into 1. The
idea here is that the bits which are "on" in one mask, don't overlap
with the "on" bits of the other. Consider an example (10 bits for
simplicity):

Mask 1: 0101101000
Mask 2: 0000000110

Those can be combined into a single mask: 0101101110.

To check if such an operation is possible, the code takes the mask
which is greater and counts how many 0s there are, starting from the
LSB and stopping at the first 1. Then, it shifts 1u by this number and
compares it with the smaller mask. The problem is that when both masks
are 0, the counter will find 32 zeroes in the first mask and will try
to do a shift by 32 positions which leads to UB.

The fix is a simple sanity check, if the bigger mask is 0 or not.

https://reviews.llvm.org/D155051
2023-08-11 15:26:35 -04:00
Mirko Brkusanin
1e5359c6ba [AMDGPU] Treat KIMM32 and KIMM16 operand types as noninlinable
While they are represent 32/16 bit immediate values they are already
included in encoding of the instructions that use them and are not true
literals. FMAMK and FMAAK instructions that use them are marked with fixed
size so getInstSizeInBytes will not increase the size for these operands.

We also add tests whose logic relies on KIMM16 and KIMM32 being considered
not inlinable.

Differential Revision: https://reviews.llvm.org/D157624
2023-08-11 18:46:39 +02:00
Jeffrey Byrnes
f76ffc1f40 [MCP] Invalidate copy for super register in copy source
We must also track the super sources of a copy, otherwise we introduce a sort of subtle bug.

Consider:

1.  DEF r0:r1
2.  USE r1
3.  r6:r9 = COPY r10:r13
4.  r14:15 = COPY r0:r1
5.  USE r6
6.. r1:4 = COPY r6:9

BackwardCopyPropagateBlock processes the instructions from bottom up. After processing 6., we will have propagatable copy for r1-r4 and r6-r9. After 5., we invalidate and erase the propagatble copy for r1-r4 and r6 but not for r7-r9.

The issue is that when processing 3., data structures still say we have valid copies for dest regs r7-r9 (from 6.). The corresponding defs for these registers in 6. are r1:r4, which we mark as registers to invalidate. When invalidating, we find the copy that corresponds to r1 is 4. (this was added when processing 4.), and we say that r1 now maps to unpropagatable copies. Thus, when we process 2., we do not have a valid copy, but when we process 1. we do -- because the mapped copy for subregister r0 was never invalidated.

The net result is to propagate the copy from 4. to 1., and replace DEF r0:r1 with DEF r14:r15. Then, we have a use before def in 2.

The main issue is that we have an inconsitent state between which def regs and which src regs are valid. When processing 5., we mark all the defs in 6. as invalid, but only the subreg use as invalid. Either we must only invalidate the individual subreg for both uses and defs, or the super register for both.

Differential Revision: https://reviews.llvm.org//D157564

Change-Id: I99d5e0b1a0d735e8ea3bd7d137b6464690aa9486
2023-08-11 09:01:18 -07:00
Jeffrey Byrnes
d0e54e377b [AMDGPU] Extend CalculateByteProvider to capture vectors and signed
Differential Revision: https://reviews.llvm.org/D157133

Change-Id: I9ba8727b4ac5a627de2f7d87d2169eb79e01f0ee
2023-08-11 08:47:17 -07:00
Joe Nash
2fb4bfa5ba [AMDGPU][True16] Fix ISel for A16 Image Instructions
The 16-bit VAddr arguments to A16 image instructions are packed into
legal VGPR_32 operands in AMDGPULegalizerInfo::legalizeImageIntrinsic on
all subtargets. With True16, we also need to pack if the number of VAddr is one
because VGPR_16 is not a legal argument to those Image instructions.

No change to emitted code intended on subtargets pre-GFX11, and none on GFX11
until True16 is active.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D157426
2023-08-11 11:12:16 -04:00
Matt Devereau
c52d9509d4 [AArch64][SVE] Add asm predicate constraint Uph
Some instructions such as multi-vector LD1 only accept a range
of PN8-PN15 predicate-as-counter. This new constraint allows more
refined parsing and better decision making when parsing these
instructions from ASM, instead of defaulting to Upa which incorrectly
uses the whole range of registers P0-P15 from the register class PPR.

Differential Revision: https://reviews.llvm.org/D157517
2023-08-11 14:48:19 +00:00
Matt Arsenault
8f18cf77e7 AMDGPU: Check for implicit defs before constant folding instruction
Can't delete the constant folded instruction if scc is used.

Fixes #63986

https://reviews.llvm.org/D157504
2023-08-11 10:29:53 -04:00
Matt Arsenault
1030483561 AMDGPU/GlobalISel: Handle stacksave/stackrestore
https://reviews.llvm.org/D156670
2023-08-11 10:25:01 -04:00
Matt Arsenault
9a53f5f5c4 AMDGPU: Handle llvm.stacksave and llvm.stackrestore
Not sure if the only valid use is to have stackrestore directly
consume stacksave outputs or not. Handled exactly like a regular stack
pointer so all the edge cases theoretically should work.

https://reviews.llvm.org/D156669
2023-08-11 10:25:01 -04:00