fixed: #59095
Update libcall signatures to use multivalue return rather than returning via a pointer
when the multivalue features is enabled in the WebAssembly backend.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D146271
The 'ELPM' instruction has three forms:
--------------------------
| form | feature |
| ----------- | -------- |
| ELPM | hasELPM |
| ELPM Rd, Z | hasELPMX |
| ELPM Rd, Z+ | hasELPMX |
--------------------------
The second form is always used in the expansion of the pseudo
instruction 'ELPMBRdZ'. But for devices without ELPMX but only
with ELPM, only the first form can be emitted.
Reviewed By: jacquesguan
Differential Revision: https://reviews.llvm.org/D141221
The base pointer register is reserved by compiler when there is
dynamic size alloca and stack realign in a function. However the
base pointer register is not defined in X86 ABI, so user can use
this register in inline assembly. The inline assembly would
clobber base pointer register without being awared by user. This
patch is to create extra prolog to save the stack pointer to a
scratch register and use this register to reference argument from
stack. For some calling convention (e.g. regcall), there may be
few scratch register.
Below is the example code for such case.
```
extern int bar(void *p);
long long foo(size_t size, char c, int id) {
__attribute__((__aligned__(64))) int a;
char *p = (char *)alloca(size);
asm volatile ("nop"::"S"(405):);
asm volatile ("movl %0, %1"::"r"(id), "m"(a):);
p[2] = 8;
memset(p, c, size);
return bar(p);
}
```
And below prolog/epilog will be emit for this case.
```
leal 4(%esp), %ebx
.cfi_def_cfa %ebx, 0
andl $-128, %esp
pushl -4(%ebx)
...
leal 4(%ebx), %esp
.cfi_def_cfa %esp, 4
```
Differential Revision: https://reviews.llvm.org/D145650
This patch adds patterns to reduce redundant mov and sel instructions
for shift intrinsics with FalseLanesZero mode, when
FeatureExperimentalZeroingPseudosis supported.
For example, before:
mov z1.b, #0
sel z0.b, p0, z0.b, z1.b
asr z0.b, p0/m, z0.b, #7
After:
movprfx z0.b, p0/z, z0.b
asr z0.b, p0/m, z0.b, #7
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D145551
ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel
descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID
setting which could cause user SGPRs to be clobbered if the number of SGPRs
reserved was near a granulated block boundary.
When XNACK was enabled this worked correctly in the ASMParser which meant some
kernels were only failing without "-save-temps".
Fixes: SWDEV-382764
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D145401
This can prevent unnecessarily hoisting out of loops.
Test case cribbed from AArch64.
I also intend to make them rematerializable.
Differential Revision: https://reviews.llvm.org/D146314
Summary:
In function PPCAIXAsmPrinter::emitTracebackTable() ,the bit "IsBackChainStored" of traceback
table always set true, it will cause aix debug tools "dbx" emit an error info
"libdebug assertion "(framep->getGpr(STKP, &addr) == DB_SUCCESS && *nextStkpp == addr)"
when debug a leaf functions with no stack frame.
If a a leaf functions with no stack frame , the bit IsBackChainStored should be unset.
Reviewers: ChenZheng
Differential Revision: https://reviews.llvm.org/D146071
This is a mitigation patch for
https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
protection is skipped if a function is returned through by an unwinder rather
than the normal call/return path. The recent patch D139254 added the ability to
instrument a visible unwind path, at least in the IR case (I'm working on the
SelectionDAG instrumentation too) but there are still invisible unwinds it
can't reach.
So this patch adds logic to DwarfEHPrepare that goes through a function,
converting any call that might throw into an invoke to a simple resume cleanup,
and adding cleanup clauses to existing landingpads that lack them. Obviously we
don't really want to do this if it's wasted effort, so I also exposed
requiresStackProtector from the actual StackProtector code to skip the extra
paths if they won't be used.
Changes:
* Move test to AArch64 directory as it relies on target presence.
* Re-add Dominator-tree maintenance. Accidentally cherry-picked wrong patch.
* Skip adding paths on Windows EH functions.
https://reviews.llvm.org/D143637
This is a mitigation patch for
https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
protection is skipped if a function is returned through by an unwinder rather
than the normal call/return path. The recent patch D139254 added the ability to
instrument a visible unwind path, at least in the IR case (I'm working on the
SelectionDAG instrumentation too) but there are still invisible unwinds it
can't reach.
So this patch adds logic to DwarfEHPrepare that goes through a function,
converting any call that might throw into an invoke to a simple resume cleanup,
and adding cleanup clauses to existing landingpads that lack them. Obviously we
don't really want to do this if it's wasted effort, so I also exposed
requiresStackProtector from the actual StackProtector code to skip the extra
paths if they won't be used.
https://reviews.llvm.org/D143637
It seems the ISA manual's pseudo-code description for the
`BYTEPICK.[WD]` instructions is inaccurate; the behavior described here
should be correct though. The instructions' names are misleading too
(they pick full GRLen-wide words instead of bytes; they just index by
bytes) but let's stick to the official names for now.
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D143880
The tests added where crashing because zip instruction was returning two destination operands. ZIP according to arm returns only one destination operand.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D146055
There are no 32-bit targets that have LZCNT but not CMOV, and this allows us to test the straight line i64 pattern - otherwise we're doing the same branchy code as the 32-bit BSR test
Currently we don't emit any CFI instructions for the SCS register when
enabling SCS on RISCV. This causes problems when unwinding, since the
SCS register isn't being handled properly.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D145205
For D141247 - if that pattern was used by GISel it could cause constant bus limitation failures.
Just use inline immediates instead of S_MOV to avoid the issue.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D146131
This is an alternative fix to D145497, which also addresses
https://github.com/llvm/llvm-project/issues/60918
In D124457 which added the original code for this, @efriedma pointed
out that it wasn't safe to assume that FI #0 would be allocated at offset
0, but that part of the patch went in without any changes.
The downside of this solution is that any access to an object on the
stack that has been allocated at SP + 0, still gets moved to a separate
register first, which degrades performance.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D146056
Extend the existing store(load()) removal code to account for intermediate truncates that some targets won't remove with canCombineTruncStore - we only care about the load/store MemoryVT.
Fixes regression from D146121
This also allows us to make use of the existing isVectorClearMaskLegal shuffle canonicalization
Differential Revision: https://reviews.llvm.org/D145939
We've a argument lowering logic to prevent floating-point value pass
passed with bit-conversion, but that rule should not applied to vector
arguments.
---
How to pass argument to `foo`:
```
tail call void @foo(i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,
<vscale x 16 x float> zeroinitializer,
<vscale x 16 x float> zeroinitializer,
<vscale x 16 x float> zeroinitializer)
```
`foo` take 13 arguments, first 8 argument pass in GPR, and next 2 LMUL 8 vector
arguments passed in v8-v23, and now we run out of argument register for GPR and
vector register, so we must pass last LMUL 8 vector argument by stack.
Which means we should reserve `vlenb * 8` byte for stack for the last
vector argument.
Reviewed By: craig.topper, asb
Differential Revision: https://reviews.llvm.org/D145938
Test case to demo scaleable vector on stack will cause stack corruption.
Detail explan what happened:
```
tail call void @foo(i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,
<vscale x 16 x float> zeroinitializer,
<vscale x 16 x float> zeroinitializer,
<vscale x 16 x float> zeroinitializer)
```
`foo` take 13 arguments, first 8 argument pass in GPR, and next 2 LMUL 8 vector
arguments passed in v8-v23, and now we run out of argument register for GPR and
vector register, so we must pass last LMUL 8 vector argument by stack.
However LLVM only reserve 8 byte on stack for the LMUL 8 vector
argument, it will cause stack corruption when we try to store that into
stack.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145934