Summary:
There were a few tests that weren't enabled on the GPU. This is because
the logic caused them to be skipped as we don't use CPU featured on the
host. This also disables the logic making multiple versions of the
memory functions.
The previous patches added the necessary support for global constructors
used to register tests. This patch enables the AMDGPU target to build
and run the unit tests on the GPU. Currently this only tests the `ctype`
tests, but adding more should be straightforward from here on.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149517
This implements the same behavior as D141997 but makes sure that the same detection mechanism is used between CMake and source code.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D142108
This implements the same behavior as D141997 but makes sure that the same detection mechanism is used between CMake and source code.
Differential Revision: https://reviews.llvm.org/D142108
This implements the same behavior as D141997 but makes sure that the same detection mechanism is used between CMake and source code.
Differential Revision: https://reviews.llvm.org/D142108
The memory functions are highly performance sensitive and use builtins
where possible, but also need to define those functions names when they
don't exist to avoid compilation errors. Previously all those
redefinitions were behind the SSE2 flag for x86, which caused errors on
CPUs that supported SSE2 but not AVX512. This patch splits the various
CPU extensions out to avoid errors on such CPUs.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D138163
The memory functions are highly performance sensitive and use builtins
where possible, but also need to define those functions names when they
don't exist to avoid compilation errors. Previously all those
redefinitions were behind the SSE2 flag for x86, which caused errors on
CPUs that supported SSE2 but not AVX512. This patch splits the various
CPU extensions out to avoid errors on such CPUs.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D137868
Detect if the architecture supports FMA instructions and if
the targets depend on fma.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D123615
This is a roll forward of D101895 with two additional fixes:
Original Patch description:
> This is a follow up on D101524 which:
>
> - simplifies cpu features detection and usage,
> - flattens target dependent optimizations so it's obvious which implementations are generated,
> - provides an implementation targeting the host (march/mtune=native) for the mem* functions,
> - makes sure all implementations are unittested (provided the host can run them).
Additional fixes:
- Fix uninitialized ALL_CPU_FEATURES
- Use non pseudo microarch as it is only supported from Clang 12 on
Differential Revision: https://reviews.llvm.org/D102233
This reverts commit 541f107871bc9c020925a6e5342542a47c902d12 as the bots
are failing with unknown architecture "x86-64-v*". Will let the original
author decide on the right course of action to correct the problem and
reland.
This is a follow up on D101524 which:
- simplifies cpu features detection and usage,
- flattens target dependent optimizations so it's obvious which implementations are generated,
- provides an implementation targeting the host (march/mtune=native) for the mem* functions,
- makes sure all implementations are unittested (provided the host can run them),
- makes sure all implementations are benchmarkable (provided the host can run them).
Differential Revision: https://reviews.llvm.org/D101895
Current implementation defines LIBC_TARGET_MACHINE with the use of CMAKE_SYSTEM_PROCESSOR.
Unfortunately CMAKE_SYSTEM_PROCESSOR is OS dependent and can produce different results.
An evidence of this is the various matchers used to detect whether the architecture is x86.
This patch normalizes LIBC_TARGET_MACHINE and renames it LIBC_TARGET_ARCHITECTURE.
I've added many architectures but we may want to limit ourselves to x86 and ARM.
Differential Revision: https://reviews.llvm.org/D101524
We won't be able to run the compiled program since it will be compiled
for different system. We instead allow passing the CPU features via
CMake option in that case.
Differential Revision: https://reviews.llvm.org/D95203
The feature check should probably be enhanced for non-x86 architectures,
but this change shields them from x86 specific pieces until then.
This patch has been split out from https://reviews.llvm.org/D81533.
Summary:
The patch is not ready yet and is here to discuss a few options:
- How do we customize the implementation? (i.e. how to define `kRepMovsBSize`),
- How do we specify custom compilation flags? (We'd need `-fno-builtin-memcpy` to be passed in),
- How do we build? We may want to test in debug but build the libc with `-march=native` for instance,
- Clang has a brand new builtin `__builtin_memcpy_inline` which makes the implementation easy and efficient, but:
- If we compile with `gcc` or `msvc` we can't use it, resorting on less efficient code generation,
- With gcc we can use `__builtin_memcpy` but then we'd need a postprocess step to check that the final assembly do not contain call to `memcpy` (unlikely but allowed),
- For msvc we'd need to resort on the compiler optimization passes.
Reviewers: sivachandra, abrachet
Subscribers: mgorny, MaskRay, tschuett, libc-commits, courbet
Tags: #libc-project
Differential Revision: https://reviews.llvm.org/D74397