47405 Commits

Author SHA1 Message Date
Luo, Yuanke
614c63bec6 [X86] Create extra prolog/epilog for stack realignment [part 2]
This patch is to support D145650 for elf target as well.

Differential Revision: https://reviews.llvm.org/D146489
2023-03-21 13:43:39 +08:00
Congcong Cai
d9661d79f4 [Webassembly][multivalue] update libcall signature when multivalue feature enabled
fixed: #59095
Update libcall signatures to use multivalue return rather than returning via a pointer
when the multivalue features is enabled in the WebAssembly backend.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D146271
2023-03-21 12:10:51 +08:00
Ben Shi
4fa9dc9482 [AVR] Fix incorrect expansion of the pseudo 'ELPMBRdZ' instruction
The 'ELPM' instruction has three forms:

--------------------------
| form        | feature  |
| ----------- | -------- |
| ELPM        | hasELPM  |
| ELPM Rd, Z  | hasELPMX |
| ELPM Rd, Z+ | hasELPMX |
--------------------------

The second form is always used in the expansion of the pseudo
instruction 'ELPMBRdZ'. But for devices without ELPMX but only
with ELPM, only the first form can be emitted.

Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D141221
2023-03-21 11:33:56 +08:00
Luo, Yuanke
e4c1dfed38 [X86] Create extra prolog/epilog for stack realignment
The base pointer register is reserved by compiler when there is
dynamic size alloca and stack realign in a function. However the
base pointer register is not defined in X86 ABI, so user can use
this register in inline assembly. The inline assembly would
clobber base pointer register without being awared by user. This
patch is to create extra prolog to save the stack pointer to a
scratch register and use this register to reference argument from
stack. For some calling convention (e.g. regcall), there may be
few scratch register.
Below is the example code for such case.

```
extern int bar(void *p);
long long foo(size_t size, char c, int id) {
  __attribute__((__aligned__(64))) int a;
  char *p = (char *)alloca(size);
  asm volatile ("nop"::"S"(405):);
  asm volatile ("movl %0, %1"::"r"(id), "m"(a):);
  p[2] = 8;
  memset(p, c, size);
  return bar(p);
}
```
And below prolog/epilog will be emit for this case.
```
leal    4(%esp), %ebx
.cfi_def_cfa %ebx, 0
andl    $-128, %esp
pushl   -4(%ebx)
...
leal    4(%ebx), %esp
.cfi_def_cfa %esp, 4
```

Differential Revision: https://reviews.llvm.org/D145650
2023-03-21 08:09:56 +08:00
Nemanja Ivanovic
6ee4ea8e2f [PowerPC][NFC] Test needs to include constant pool values 2023-03-20 16:43:59 -05:00
Nemanja Ivanovic
da40f7e8b1 [PowerPC][NFC] Pre-commit a test case for upcoming patch 2023-03-20 15:42:07 -05:00
David Green
cd22e7c3ad [AArch64] Regenerate neon-vcmla.ll tests and add tests for combining fadd with vcmla. NFC
See D146407.
2023-03-20 16:29:28 +00:00
Muhammad Omair Javaid
8d6ab7d519 Revert "Revert "[SVE] Add patterns for shift intrinsics with FalseLanesZero mode""
This reverts commit 32bd1f562f835044d11b7ecfb36362a29eb00a02.
2023-03-20 15:33:20 +05:00
Muhammad Omair Javaid
32bd1f562f Revert "[SVE] Add patterns for shift intrinsics with FalseLanesZero mode"
This reverts commit 22c3ba4bb519e12395c676ffe436ea4b8400234a.

Breaks buildbot https://lab.llvm.org/buildbot/#/builders/197/builds/4272

Differential Revision: https://reviews.llvm.org/D145551
2023-03-20 12:39:39 +05:00
lizhijin
22c3ba4bb5 [SVE] Add patterns for shift intrinsics with FalseLanesZero mode
This patch adds patterns to reduce redundant mov and sel instructions
for shift intrinsics with FalseLanesZero mode, when
FeatureExperimentalZeroingPseudosis supported.

For example, before:

mov     z1.b, #0
sel     z0.b, p0, z0.b, z1.b
asr     z0.b, p0/m, z0.b, #7
After:

movprfx z0.b, p0/z, z0.b
asr     z0.b, p0/m, z0.b, #7

Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D145551
2023-03-19 13:49:01 +08:00
Austin Kerbow
864a2b25be [AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting
ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel
descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID
setting which could cause user SGPRs to be clobbered if the number of SGPRs
reserved was near a granulated block boundary.

When XNACK was enabled this worked correctly in the ASMParser which meant some
kernels were only failing without "-save-temps".

Fixes: SWDEV-382764

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D145401
2023-03-17 20:26:23 -07:00
Heejin Ahn
4e844a1498 [WebAssembly] Replace Bugzilla links with Github issues
Reviewed By: dschuff, asb

Differential Revision: https://reviews.llvm.org/D145966
2023-03-17 20:13:00 -07:00
Pavel Kopyl
7adacaa098 [NVPTX] Report fatal error on empty argument type.
Differential Revision: https://reviews.llvm.org/D146331
2023-03-18 01:27:43 +01:00
Craig Topper
101cf0b8ab [RISCV] Add isReMaterializable to FLI instructions.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D146321
2023-03-17 12:16:37 -07:00
Craig Topper
0a895c39ad [RISCV] Add isAsCheapAsAMove to FLI instructions.
This can prevent unnecessarily hoisting out of loops.

Test case cribbed from AArch64.

I also intend to make them rematerializable.

Differential Revision: https://reviews.llvm.org/D146314
2023-03-17 12:16:14 -07:00
Craig Topper
f36ec414c9 [RISCV] Add test case showing fli being hoisted out of a loop and creating extra copies/spills.
Test case for D146314.

Differential Revision: https://reviews.llvm.org/D146315
2023-03-17 12:16:14 -07:00
Matt Arsenault
9356ec1516 CodeGen: Reorder case handling for is.fpclass legalization
Subnormal and zero checks can be combined into one, so move
the code closer to reduce the diff in a future change.
2023-03-17 11:29:50 -04:00
Krzysztof Parzyszek
0eac3c5004 [Hexagon] Ensure proper ordering of instructions in HVC::AlignVectors
The shuffle reduction creates a dependency chain. Make sure that the
inputs to the next instruction are placed ahead of the instruction itself.
2023-03-17 08:13:49 -07:00
Nikita Popov
687b5b9a0c [SCEVExpander] Always use scevgep as name
With opaque pointers the scevgep / uglygep distinction no longer
makes sense -- GEPs are always emitted in offset-based representation.
2023-03-17 14:27:03 +01:00
Craig Topper
4063369fd4 [RISCV] Add MULW to RISCVStripWSuffix.
This converts MULW to MUL if the upper bits aren't used.
This will give more opportunities to use c.mul with Zcb.
2023-03-16 19:42:33 -07:00
Vitaly Buka
aa15fe98b6 Revert "[AMDGPUUnifyDivergentExitNodes] Add NewPM support"
Introduces nullptr dereference.

This reverts commit a5455e32b364dabe499ec11722626d4bbaf047ba.
2023-03-16 19:03:46 -07:00
zhijian
49bc3077cb [AIX] unset bit "IsBackChainStored" of traceback table for leaf functions with no stack frame
Summary:

  In function PPCAIXAsmPrinter::emitTracebackTable() ,the bit "IsBackChainStored" of traceback
table always set true, it will cause aix debug tools "dbx" emit an error info
"libdebug assertion "(framep->getGpr(STKP, &addr) == DB_SUCCESS && *nextStkpp == addr)"
when debug a leaf functions with no stack frame.

If a a leaf functions with no stack frame , the bit IsBackChainStored should be unset.

Reviewers: ChenZheng
Differential Revision: https://reviews.llvm.org/D146071
2023-03-16 15:26:12 -04:00
Mirko Brkusanin
d5c0c1b6f0 [AMDGPU] Select flat atomic fmin/fmax
Also disables global atomic fmin/fmax x2 patterns on gfx11

Differential Revision: https://reviews.llvm.org/D146137
2023-03-16 18:07:26 +01:00
Mikhail R. Gadelha
185ea867eb [RISCV] Fix missing addi in test to validate lower inline asm m with offset 2023-03-16 13:30:53 -03:00
Anshil Gandhi
a5455e32b3 [AMDGPUUnifyDivergentExitNodes] Add NewPM support
Meanwhile, use UniformityAnalysis instead of LegacyDivergenceAnalysis to collect divergence info.

Reviewed By: arsenm, sameerds

Differential Revision: https://reviews.llvm.org/D141355
2023-03-16 16:13:29 +00:00
Mikhail R. Gadelha
4bbee03d8a [RISCV] Added tests to validate lower inline asm m and A with offsets 2023-03-16 13:12:39 -03:00
Tim Northover
2d690684f6 Recommit DwarfEHPrepare: insert extra unwind paths for stack protector to instrument
This is a mitigation patch for
https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
protection is skipped if a function is returned through by an unwinder rather
than the normal call/return path. The recent patch D139254 added the ability to
instrument a visible unwind path, at least in the IR case (I'm working on the
SelectionDAG instrumentation too) but there are still invisible unwinds it
can't reach.

So this patch adds logic to DwarfEHPrepare that goes through a function,
converting any call that might throw into an invoke to a simple resume cleanup,
and adding cleanup clauses to existing landingpads that lack them. Obviously we
don't really want to do this if it's wasted effort, so I also exposed
requiresStackProtector from the actual StackProtector code to skip the extra
paths if they won't be used.

Changes:
  * Move test to AArch64 directory as it relies on target presence.
  * Re-add Dominator-tree maintenance. Accidentally cherry-picked wrong patch.
  * Skip adding paths on Windows EH functions.

https://reviews.llvm.org/D143637
2023-03-16 13:43:17 +00:00
Tim Northover
e4b352a0b9 Revert "DwarfEHPrepare: insert extra unwind paths for stack protector to instrument"
It's caused more failures than are trivially fixable.

This reverts commit 203b6f31bb71ce63488eb96b303e000e91aee376.
2023-03-16 11:55:53 +00:00
Tim Northover
203b6f31bb DwarfEHPrepare: insert extra unwind paths for stack protector to instrument
This is a mitigation patch for
https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
protection is skipped if a function is returned through by an unwinder rather
than the normal call/return path. The recent patch D139254 added the ability to
instrument a visible unwind path, at least in the IR case (I'm working on the
SelectionDAG instrumentation too) but there are still invisible unwinds it
can't reach.

So this patch adds logic to DwarfEHPrepare that goes through a function,
converting any call that might throw into an invoke to a simple resume cleanup,
and adding cleanup clauses to existing landingpads that lack them. Obviously we
don't really want to do this if it's wasted effort, so I also exposed
requiresStackProtector from the actual StackProtector code to skip the extra
paths if they won't be used.

https://reviews.llvm.org/D143637
2023-03-16 11:32:45 +00:00
Nikita Popov
bbfb13a5ff [ConstExpr] Remove select constant expression
This removes the select constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
Uses of this expressions have already been removed in advance,
so this just removes related infrastructure and updates tests.

Differential Revision: https://reviews.llvm.org/D145382
2023-03-16 10:32:08 +01:00
WANG Xuerui
19e2ebbf45 [LoongArch] Emit bytepick for picking from concatenation of two values
It seems the ISA manual's pseudo-code description for the
`BYTEPICK.[WD]` instructions is inaccurate; the behavior described here
should be correct though. The instructions' names are misleading too
(they pick full GRLen-wide words instead of bytes; they just index by
bytes) but let's stick to the official names for now.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D143880
2023-03-16 15:07:06 +08:00
WANG Xuerui
ff475a0dd9 [LoongArch] Add baseline tests for bytepick codegen. NFC
Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D143879
2023-03-16 15:07:06 +08:00
LiaoChunyu
fc9730376c [RISCV]Optimize (riscvisd::select_cc x, 0, ne, x, 1)
This patch reduces the number of unpredictable branches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D146117
2023-03-16 10:56:26 +08:00
WANG Xuerui
db5dfec9d4 [Clang][LoongArch] Implement patchable function entry
Similar to D98610 for RISCV.

This is going to be required by the upcoming Linux/LoongArch
[[ https://git.kernel.org/linus/4733f09d88074 | support for dynamic ftrace ]].

Reviewed By: SixWeining, MaskRay

Differential Revision: https://reviews.llvm.org/D141785
2023-03-16 09:33:58 +08:00
Jon Roelofs
aba4e4d6c1
[AArch64] Add hex comments to mov-imm spellings in the InstPrinter
Differential Revision: https://reviews.llvm.org/D146105
2023-03-15 14:29:44 -07:00
Jon Roelofs
cdee83b015
Revert "[AArch64] Add hex comments to mov-imm spellings in the InstPrinter"
This reverts commit 1def3141135c072a1d3e51e82e113dd67b0def97.
2023-03-15 14:21:08 -07:00
Jon Roelofs
1def314113
[AArch64] Add hex comments to mov-imm spellings in the InstPrinter
Differential Revision: https://reviews.llvm.org/D146105
2023-03-15 14:08:51 -07:00
Zain Jaffal
4b09d7a8ac
[AArch64] Change GeneratePerfectShuffle to return one destination operand for zip and transpose operations.
The tests added where crashing because zip instruction was returning two destination operands. ZIP according to arm returns only one destination operand.

Reviewed By: dmgreen, fhahn

Differential Revision: https://reviews.llvm.org/D146055
2023-03-15 21:05:18 +00:00
Simon Pilgrim
5be5510098 [X86] lzcnt-cmp.ll - enable CMOV on 32-bit LZCNT tests
There are no 32-bit targets that have LZCNT but not CMOV, and this allows us to test the straight line i64 pattern - otherwise we're doing the same branchy code as the 32-bit BSR test
2023-03-15 18:14:53 +00:00
Simon Pilgrim
28a0d0e85a [DAG] Don't fold zext(logicalshift(zext(x),c)) -> logicalshift(zext(x),c) if the outer zext is free
Avoid widening the shift to a bigger type if the zext would be free anyway

Pulled out of D146121
2023-03-15 17:45:12 +00:00
Simon Pilgrim
2281286eb7 [X86] Add more thorough testing of the zext(logicalshift(zext(x),c)) -> logicalshift(zext(x),c) fold
Add tests for more extension combos, 64-bit targets and some illegal types
2023-03-15 17:20:42 +00:00
Paul Kirth
ade336d6e1 [codegen][riscv] Emit CFI directives when using shadow call stack
Currently we don't emit any CFI instructions for the SCS register when
enabling SCS on RISCV. This causes problems when unwinding, since the
SCS register isn't being handled properly.

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D145205
2023-03-15 17:10:23 +00:00
Konstantina Mitropoulou
6bc5aa592a [AMDGPU] Update mul.ll with auto-generated checks
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145990
2023-03-15 08:16:28 -07:00
Simon Pilgrim
4ead58914c [X86] add-and-not.ll - add 32-bit test coverage 2023-03-15 15:13:02 +00:00
pvanhout
723a53caaf [AMDGPU] Avoid constant bus limitation on V_BFE GISel pattern
For D141247 - if that pattern was used by GISel it could cause constant bus limitation failures.
Just use inline immediates instead of S_MOV to avoid the issue.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D146131
2023-03-15 15:01:33 +01:00
Sander de Smalen
93b89bee47 [AArch64][SVE] Fix the indexed addressing mode when FI = 0.
This is an alternative fix to D145497, which also addresses
  https://github.com/llvm/llvm-project/issues/60918

In D124457 which added the original code for this, @efriedma pointed
out that it wasn't safe to assume that FI #0 would be allocated at offset
0, but that part of the patch went in without any changes.

The downside of this solution is that any access to an object on the
stack that has been allocated at SP + 0, still gets moved to a separate
register first, which degrades performance.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D146056
2023-03-15 13:39:43 +00:00
Simon Pilgrim
c1f81e7604 [DAG] mergeStore - peek through truncates when finding dead store(trunc(load())) patterns
Extend the existing store(load()) removal code to account for intermediate truncates that some targets won't remove with canCombineTruncStore - we only care about the load/store MemoryVT.

Fixes regression from D146121
2023-03-15 11:54:13 +00:00
Simon Pilgrim
70562607ab [DAG] Fold multiple insert_vector_elt of zero values into an AND mask
This also allows us to make use of the existing isVectorClearMaskLegal shuffle canonicalization

Differential Revision: https://reviews.llvm.org/D145939
2023-03-15 09:56:26 +00:00
Kito Cheng
cf40b8a4dd [RISCV] Pass vector argument by stack correctly.
We've a argument lowering logic to prevent floating-point value pass
passed with bit-conversion, but that rule should not applied to vector
arguments.

---

How to pass argument to `foo`:

```
tail call void @foo(i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,
                    <vscale x 16 x float> zeroinitializer,
                    <vscale x 16 x float> zeroinitializer,
                    <vscale x 16 x float> zeroinitializer)
```

`foo` take 13 arguments, first 8 argument pass in GPR, and next 2 LMUL 8 vector
arguments passed in v8-v23, and now we run out of argument register for GPR and
vector register, so we must pass last LMUL 8 vector argument by stack.

Which means we should reserve `vlenb * 8` byte for stack for the last
vector argument.

Reviewed By: craig.topper, asb

Differential Revision: https://reviews.llvm.org/D145938
2023-03-15 17:22:47 +08:00
Kito Cheng
ba1c7731f1 [RISCV] Precommit test to show wrong way to pass scalable FP vector on stack
Test case to demo scaleable vector on stack will cause stack corruption.

Detail explan what happened:

```
tail call void @foo(i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,
                    <vscale x 16 x float> zeroinitializer,
                    <vscale x 16 x float> zeroinitializer,
                    <vscale x 16 x float> zeroinitializer)
```

`foo` take 13 arguments, first 8 argument pass in GPR, and next 2 LMUL 8 vector
arguments passed in v8-v23, and now we run out of argument register for GPR and
vector register, so we must pass last LMUL 8 vector argument by stack.

However LLVM only reserve 8 byte on stack for the LMUL 8 vector
argument, it will cause stack corruption when we try to store that into
stack.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D145934
2023-03-15 17:21:07 +08:00