Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.
The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
Hi, I spotted a problem when running benchmarking programs on a RISCV64
device.
## Issue
Segmentation faults only occurred while running the programs compiled
with `GlobalISel` enabled.
Here is a small but complete example (it is adopted from [Google's
benchmark
framework](95a9f0d0b4/MicroBenchmarks/libs/benchmark/src/colorprint.cc (L85-L119))
to reproduce the issue,
```cpp
#include <cstdarg>
#include <cstdio>
#include <iostream>
#include <memory>
#include <string>
std::string FormatString(const char* msg, va_list args) {
// we might need a second shot at this, so pre-emptivly make a copy
va_list args_cp;
va_copy(args_cp, args);
std::size_t size = 256;
char local_buff[256];
auto ret = vsnprintf(local_buff, size, msg, args_cp);
va_end(args_cp);
// currently there is no error handling for failure, so this is hack.
// BM_CHECK(ret >= 0);
if (ret == 0) // handle empty expansion
return {};
else if (static_cast<size_t>(ret) < size)
return local_buff;
else {
// we did not provide a long enough buffer on our first attempt.
size = static_cast<size_t>(ret) + 1; // + 1 for the null byte
std::unique_ptr<char[]> buff(new char[size]);
ret = vsnprintf(buff.get(), size, msg, args);
// BM_CHECK(ret > 0 && (static_cast<size_t>(ret)) < size);
return buff.get();
}
}
std::string FormatString(const char* msg, ...) {
va_list args;
va_start(args, msg);
auto tmp = FormatString(msg, args);
va_end(args);
return tmp;
}
int main() {
std::string Str =
FormatString("%-*s %13s %15s %12s", static_cast<int>(20),
"Benchmark", "Time", "CPU", "Iterations");
std::cout << Str << std::endl;
}
```
Use `clang++ -fglobal-isel -o main main.cpp` to compile it.
## Cause
I have examined MIR, it shows that these segmentation faults resulted
from a small mistake about legalizing the intrinsic function
`llvm.va_copy`.
36e74cfdbd/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (L451-L453)
`DstLst` and `Tmp` are placed in the wrong order.
## Changes
I have tweaked the test case `CodeGen/RISCV/GlobalISel/vararg.ll` so
that `s0` is used as the frame pointer (not in all checks) which points
to the starting address of the save area. I believe that it helps reason
about how `llvm.va_copy` is handled.
This PR improves type inference in general and deduces types of
composite data structures in particular. Also added a way to insert a
bitcast to make a fun call valid in case of arguments types mismatch due
to opaque pointers type inference.
The attached test `pointers/nested-struct-opaque-pointers.ll`
demonstrates new capabilities: the SPIRV code emitted for this test is
now (1) valid in a sense of data field types and (2) accepted by
`spirv-val`.
More strict LIT checks, support of more composite data structures and
improvement of fun calls from the perspective of type correctness are
main todo's at the moment.
This reverts commit 52431fdb1ab8d29be078edd55250e06381e4b6b0.
The PR assumed `__threwValue` couldn't be 0, but it could be when the
thrown thing is not a longjmp but an exception, so that `if` check was
actually necessary.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.
We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.
Fixes#86717
combineExtractFromVectorLoad no longer uses the vector we're extracting from to determine the pointer offset calculation, allowing us to extract from types that have been bitcast to work with specific target shuffles.
Fixes#85419
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes#48911
[RISCV] RISCV vector calling convention (1/2)
This is the vector calling convention based on
https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
the idea is to split between "scalar" callee-saved registers
and "vector" callee-saved registers. "scalar" ones remain the
original strategy, however, "vector" ones are handled together
with RVV objects.
The stack layout would be:
|--------------------------| <-- FP
| callee-allocated save |
| area for register varargs|
|--------------------------|
| callee-saved registers | <-- scalar callee-saved
| (scalar) |
|--------------------------|
| RVV alignment padding |
|--------------------------|
| callee-saved registers | <-- vector callee-saved
| (vector) |
|--------------------------|
| RVV objects |
|--------------------------|
| padding before RVV |
|--------------------------|
| scalar local variables |
|--------------------------| <-- BP
| variable size objects |
|--------------------------| <-- SP
Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).
Differential Revision: https://reviews.llvm.org/D154576
The AMDGPUSimplifyLibCalls pass was lowering function calls with the
strictfp attribute to sequences that included function calls incorrectly
lacking the attribute. This patch corrects that.
The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.
Replacing non-constrained calls with constrained calls when required
is still on the IRBuilder's TODO list.
Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645.
D113443 is the original attempt to bring this FreeBSD patch to
llvm-project,
but it never landed. This change is required to build FreeBSD kernel
modules
with -fstack-protector using a standard LLVM toolchain. The FreeBSD
kernel
loader does not handle R_X86_64_REX_GOTPCRELX relocations.
Fixes#50932.
Adds logic to the IR verifier that checks whether !tbaa.struct nodes are
well-formed. That is, it checks that the operands of !tbaa.struct nodes
are in groups of three, that each group of three operands consists of
two integers and a valid tbaa node, and that the regions described by
the offset and size operands are non-overlapping.
PR: https://github.com/llvm/llvm-project/pull/86709
Unlike add, sub and mul, we don't have widening instructions for div,
rem and logical ops, so we don't have any test coverage if we were to
extend combineBinOpOfZExts to handle them.
Adding tests coincidentally revealed that logical ops are already
narrowed as a generic DAG combine via
DAGCombiner::hoistLogicOpWithSameOpcodeHands. So we don't actually need
to run combineBinOpOfZExts on them.
This an alternative to #84935 to fix the miscompile, but not be optimal.
The immediate for cm.push/pop must be a multiple of 16. For RVE, it
might not be. It's not easy to increase the stack size without messing
up cfa directives and maybe other things.
This patch rounds the stack size down to a multiple of 16 before
clamping it to 48. This causes an extra addi to be emitted to handle the
remainder.
Once this commited, I can commit #84989 to add verification for these
instructions being generated with valid offsets.
G_VSCALE should be lowered using VLENB. If the type is not sXLen it
should be lowered using a G_VSCALE on the narrow type and a G_MUL.
regbank select and instruction select are straightforward so we really
only need to add tests to show it works.
When folding (ashr (shl, x, c1), c2) we need to treat c1 and c2
as unsigned to find out if the combined shift should be a left
or right shift.
Also do an early out during pre-legalization in case c1 and c2
has differet types, as that otherwise complicated the comparison
of c1 and c2 a bit.
It has been noticed that combineShiftRightArithmetic isn't dealing
properly with large shift amounts, as demonstrated by the test
case added in this commit.
I think the problem partly is related to X86 using i8 as shift amount
type during ISel. So shift amount larger then 127 may be treated
as negative shift amounts if not being careful.
Building on #86248, we can also narrow the width of a mul of zexts.
This is specifically legal because on RVV we always extend to the next
power of 2 width, and multiplying two N bit integers produces a maximum
value of 2\*N bits.
So as long as we keep an inner zext of 2\*N, we will have enough space
for the multiply and won't overflow.
Alive2 proof: https://alive2.llvm.org/ce/z/XteYyb
FADD/FSUB/FMUL are usually less port-bound than INSERT_SUBVECTOR, so only concatenate if it reduces the instruction count and doesn't introduce extra INSERT_SUBVECTOR nodes.
We don't want to concat fadd/fsub/fmul if both operands would need concatenating (as the fp op is usually cheaper than the concat), but if at least one operand is free to concat (i.e. constant or extracted from a wider vector), then we should try to concat the fp op.
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
Similar to how we protected FP/fixed-vector arguments and results from
calls, we should do the same for arguments/results from locally-streaming
functions such that those are not spilled/filled as ZPR registers.
This may cause a small regression (additional spills/fills), which is
addressed by #85386.