Debug info correlation is an option in InstrProfiling pass, which is used by
both IR instrumentation and front-end instrumentation. So, Clang coverage can
also benefits the binary size saving from it.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D157913
2 of them were missing the ":" on the end.
Adding them broke the test so I had to add a new prefix just for the
warning runs only.
I manually checked the first RUNs and there is no warning emitted, as
expected.
MSVC allows users to pass structures with required alignments greater
than 4 to variadic functions. It does not pass them indirectly to
correctly align them. Instead, it passes them directly with the usual 4
byte stack alignment.
This change implements the same logic in clang on the passing side. The
receiving side (va_arg) never implemented any of this indirect logic, so
it doesn't need to be updated.
This issue pre-existed, but @aaron.ballman noticed it when we started
passing structs containing aligned fields indirectly in D152752.
The new ACLE PR#225[1] now combines the slice parameters for some
builtins.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
Summary:
Currently, there is an assertion that prevents us from emitting an
AMDGPU global with a non-target specific address space (i.e. numerical
attribute). I'm unsure what the original intentions of this assertion
were, but we should be able to use OpenCL address spaces when compiling
directly to AMDGPU from C++. This is permitted on NVPTX so I'm unsure
what this assertion is guarding. The patch simply removes the assertion
and adds a test to ensure that these emit the expected address spaces.
Fixes https://github.com/llvm/llvm-project/issues/65069
Summary:
We use the `llvm.amgcn.abi.version` varaible to control code generation.
This is emitted in every module now to indicate what should be used when
compiling. Previously, the logic caused us to emit an external reference
to this variable when creating the code for the `none` type. This would
then cause us not to emit the actual definition. This patch refines the
logic to create the external reference, and then update it if it is
found unset by the time we emit the global. I had to remove the
reference to `GetOrCreateLLVmGlobal` because it did not accept the
proper address space.
This reverts commit c6a33ff49dfb3498dae15c718820ea3d9c19f3cb. Makes
clang segfault.
// clang t.cc
class a;
class c {
public:
[[clang::annotate("")]] c(const c *) {}
};
class d {
d(const c *, a *, a *);
c e;
};
d::d(const c *f, a *, a *) : e(f) {}
Previously, annotations were only emitted for function definitions. With
this change annotations are also emitted for declarations. Also, emitting
function annotations is now deferred until the end so that the most
up to date declaration is used which will have any inherited annotations.
Differential Revision: https://reviews.llvm.org/D156172/new/
The new ACLE PR#225[1] now combines the slice parameters for some
builtins. This patch is the #2 of 3 patches to update the interface.
Slice specifies the ZA slice number directly and needs to be explicity
implemented by the "user" with the base register plus the immediate
offset
[1]https://github.com/ARM-software/acle/pull/225/files
When building the LoongArch Linux kernel without
`CONFIG_DYNAMIC_FTRACE`, the build fails to link because the mcount
symbol is `mcount`, not `_mcount` like GCC generates and the kernel
expects:
```
ld.lld: error: undefined symbol: mcount
>>> referenced by version.c
>>> init/version.o:(early_hostname) in archive vmlinux.a
>>> referenced by do_mounts.c
>>> init/do_mounts.o:(rootfs_init_fs_context) in archive vmlinux.a
>>> referenced by main.c
>>> init/main.o:(__traceiter_initcall_level) in archive vmlinux.a
>>> referenced 97011 more times
>>> did you mean: _mcount
>>> defined in: vmlinux.a(arch/loongarch/kernel/mcount.o)
```
Set `MCountName` in `LoongArchTargetInfo` to `_mcount`, which resolves
the build failure.
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
Buitlins from AMD's device-libs are compiled without specifying a
target-cpu, which results in builtins without the target-features
attribute set.
Before this patch, when linking this builtins with -mlink-builtin-bitcode
the target-features were not propagated in the incoming builtins.
With this patch, the default target features are propagated
if they are compatible with the target-features in the incoming builtin.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159206
This patch enables some fp16 vector type builtins that don't use fp arithmetic instruction for zvfhmin without zvfh.
Include following builtins:
vector load/store,
vector reinterpret,
vmerge_vvm,
vmv_v.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151869
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
As reported by @uweigand in https://reviews.llvm.org/D158883:
```
The newly added test cases in ffp-model.c fail on SystemZ, making CI red:
https://lab.llvm.org/buildbot/#/builders/94/builds/16280
The root cause seems to be that by default, the SystemZ back-end targets
a machine without SIMD support, and therefore vector return types are
passed via implicit reference according to the ABI
```
this uses manual stores instead of vector returns.
mffsl is available since ISA 3.0. The builtin is named with ppc prefix
to follow our convention. For targets earlier than power9, GCC generates
extra code to support the functionality, while this patch does not
implement such behavior.
Reviewed By: nemanjai, tuliom
Differential Revision: https://reviews.llvm.org/D158065
Otherwise these functions are not inlined when invoked from streaming
functions.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D159188
Pick up a few more codegen parametres from downstream. The DWARF version matchesthe GCC config.
DWARF version 4, Math Errno handling, ObjC ABI handling, and debug handling.
Reviewed By: nielx
Differential Revision: https://reviews.llvm.org/D159409
Builtin function `__builtin_isfpclass` now can be called for a vector
of floating-point values. In this case it is applied to the vector
elementwise and produces vector of integer values.
Differential Revision: https://reviews.llvm.org/D153339
In GCC, the .refptr stubs are only generated for x86_64, and only
for code models medium and larger (and medium is the default for
x86_64 since this was introduced). They can be omitted for
projects that are conscious about performance and size, and don't
need automatically importing dll data members, by passing -mcmodel=small.
In Clang/LLVM, such .refptr stubs are generated for any potentially
symbol reference that might end up autoimported. The .refptr stubs
are emitted for three separate reasons:
- Without .refptr stubs, undefined symbols are mostly referenced
with 32 bit wide relocations. If the symbol ends up autoimported
from a different DLL, a 32 bit relative offset might not be
enough to reference data in a different DLL, depending on runtime
loader layout.
- Without .refptr stubs, the runtime pseudo relocation mechanism
will need to temporarily make sections read-write-executable
if there are such relocations in the text section
- On ARM and AArch64, the immediate addressing encoded into
instructions isn't in the form of a plain 32 bit relative offset,
but is expressed with various bits scattered throughout two
instructions - the mingw runtime pseudo relocation mechanism
doesn't support updating offsets in that form.
If autoimporting is known not to be needed, the user can now
compile with -fno-auto-import, avoiding the extra overhead of
the .refptr stubs.
However, omitting them is potentially fragile as the code
might still rely on automatically importing some symbol without
the developer knowing. If this happens, linking still usually
will succeed, but users may encounter issues at runtime.
Therefore, if the new option -fno-auto-import is passed to the compiler
when driving linking, it passes the flag --disable-auto-import to
the linker, making sure that no symbols actually are autoimported
when the generated code doesn't expect it.
Differential Revision: https://reviews.llvm.org/D61670
`clang` currently defaults to DWARF 2 on Solaris. This dates back to LLVM
3.8.0. I suspect this is related to `gcc/config/sol2.cc`
(`solaris_override_options`) doing the same unless
`HAVE_LD_EH_FRAME_CIEV3`. The latter is 1 on both Solaris 11.3 and 11.4,
so the workaround has become irrelevant these days.
This patch removes the Solaris override, adjusting affected testcases
accordingly.
Tested on `amd64-pc-solaris2.11` (`Release` and `Debug` builds) and
`sparcv9-sun-solaris2.11` (`Release` build).
Differential Revision: https://reviews.llvm.org/D159352
For vector * scalar + vector, we emit `fmuladd` directly from clang.
This enables it also for matrix * scalar + matrix.
rdar://113967122
Differential Revision: https://reviews.llvm.org/D158883
Summary:
There is no support in XCOFF for labels on common symbols. Therefore, an alias for a common symbol is not supported. Issue an error in the front end when an aliasee is a common symbol. Issue a similar error in the back end in case an IR specifies an alias for a common symbol.
Reviewed by: hubert.reinterpretcast, DiggerLin
Differential Revision: https://reviews.llvm.org/D158739
For function multi-versioning using the target or target_clones
function attributes, currently we incorrectly set comdat for internal
linkage resolvers. This is problematic for ELF linkers
as GRP_COMDAT deduplication will kick in even with STB_LOCAL signature
(https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc
"GRP_COMDAT group with STB_LOCAL signature").
In short, two `__attribute((target_clones(...))) static void foo()`
in two translation units will be deduplicated. Fix this.
Fix#65114
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D158963
FMADD, FMSUB instructions perform better or the same compared to indexed
FMLA, FMLS.
For example, the Arm Cortex-A55 Software Optimization Guide lists "FP
multiply accumulate" FMADD, FMSUB instructions with a throughput of 2
IPC, whereas it lists "ASIMD FP multiply accumulate, by element" FMLA,
FMLS with a throughput of 1 IPC.
The Arm Cortex-A77 Software Optimization Guide, however, does not
separately list "by element" variants of the "ASIMD FP multiply
accumulate" instructions, which are listed with the same throughput of 2
IPC as "FP multiply accumulate" instructions.
Reviewed By: samtebbs, dzhidzhoev
Differential Revision: https://reviews.llvm.org/D158008
MSVC defines __declspec(noalias) as follows (https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/k649tyc7(v=vs.110)?redirectedfrom=MSDN):
> noalias means that a function call does not modify or reference
> visible global state and only modifies the memory pointed to
> directly by pointer parameters (first-level indirections).
> If a function is annotated as noalias, the optimizer can assume
> that, in addition to the parameters themselves, only first-level
> indirections of pointer parameters are referenced or modified
> inside the function. The visible global state is the set of all
> data that is not defined or referenced outside of the compilation
> scope, and their address is not taken. The compilation scope is
> all source files (/LTCG (Link-time Code Generation) builds) or a
> single source file (non-/LTCG build).
The wording is not super clear to me, but I believe this is saying
that __declspec(noalias) functions may access inaccessible memory
(i.e. non-visible global state in their words). Indeed, the Windows
CRT applies this attribute to malloc, which does access inaccessible
memory under LLVM's memory model.
As such, change the attribute to emit
memory(argmem: readwrite, inaccessiblemem: readwrite) instead of
memory(argmem: readwrite).
Fixes https://github.com/llvm/llvm-project/issues/64827.
Differential Revision: https://reviews.llvm.org/D158984
We have a new policy in place making links to private resources
something we try to avoid in source and test files. Normally, we'd
organically switch to the new policy rather than make a sweeping change
across a project. However, Clang is in a somewhat special circumstance
currently: recently, I've had several new contributors run into rdar
links around test code which their patch was changing the behavior of.
This turns out to be a surprisingly bad experience, especially for
newer folks, for a handful of reasons: not understanding what the link
is and feeling intimidated by it, wondering whether their changes are
actually breaking something important to a downstream in some way,
having to hunt down strangers not involved with the patch to impose on
them for help, accidental pressure from asking for potentially private
IP to be made public, etc. Because folks run into these links entirely
by chance (through fixing bugs or working on new features), there's not
really a set of problematic links to focus on -- all of the links have
basically the same potential for causing these problems. As a result,
this is an omnibus patch to remove all such links.
This was not a mechanical change; it was done by manually searching for
rdar, radar, radr, and other variants to find all the various
problematic links. From there, I tried to retain or reword the
surrounding comments so that we would lose as little context as
possible. However, because most links were just a plain link with no
supporting context, the majority of the changes are simple removals.
Differential Review: https://reviews.llvm.org/D158071
GCC 12 (https://gcc.gnu.org/PR101696) allows
__builtin_cpu_supports("x86-64") (and -v2 -v3 -v4).
This patch ports the feature.
* Add `FEATURE_X86_64_{BASELINE,V2,V3,V4}` to enum ProcessorFeatures,
but keep CPU_FEATURE_MAX unchanged to make
FeatureInfos/FeatureInfos_WithPLUS happy.
* Change validateCpuSupports to allow `x86-64{,-v2,-v3,-v4}`
* Change getCpuSupportsMask to return `std::array<uint32_t, 4>` where
`x86-64{,-v2,-v3,-v4}` set bits `FEATURE_X86_64_{BASELINE,V2,V3,V4}`.
* `target("x86-64")` and `cpu_dispatch(x86_64)` are invalid. Tested by commit 9de3b35ac9159d5bae6e6796cb91e4f877a07189
Close https://github.com/llvm/llvm-project/issues/59961
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D158811
The implementation of __builtin_strncmp and other related builtins function use
getExtValue() to evaluate the size argument. This can cause a crash when the
value does not fit into an int64_t value, which is can be expected since the
type of the argument is size_t.
The fix is to switch to using getZExtValue().
This fixes: https://github.com/llvm/llvm-project/issues/64876
Differential Revision: https://reviews.llvm.org/D158557
Certain instrumentations set the !nosanitize metadata for inserted
instructions, which are generally not interested for sanitizers. Skip
tsan instrumentation like we do for asan (D126294)/msan/hwasan.
-fprofile-arcs instrumentation has data race unless
-fprofile-update=atomic is specified. Let's remove the the `__llvm_gcov`
special case from commit 0222adbcd25779a156399bcc16fde9f6d083a809 (2016)
as the racy instructions have the !nosanitize metadata.
(-fprofile-arcs instrumentation does not use `__llvm_gcda` as global variables.)
```
std::atomic<int> c;
void foo() { c++; }
int main() {
std::thread th(foo);
c++;
th.join();
}
```
Tested that `clang++ --coverage -fsanitize=thread a.cc && ./a.out` does
not report spurious tsan errors.
Also remove the default CC1 option -fprofile-update=atomic for
-fsanitize=thread to make options more orthogonal.
Reviewed By: Enna1
Differential Revision: https://reviews.llvm.org/D158385
The fp options specified through pragma are already encoded in Expr.
This patch takes the same approach used by clang codegen to emit
fastmath flags for fadd insts, basically use RAII to set the
current fastmath flags in IRBuilder, which is then used to emit
sqrt intrinsic.
Fixes: https://github.com/llvm/llvm-project/issues/64653