The PREFETCHW instruction was originally part of the 3DNow. But
it was given its own CPUID bit on later CPUs just before 3DNow
was deprecated.
We were setting the -mprfchw flag if -m3dnow was passed or the CPU
supported 3dnow unless -mno-prfchw was passed. But -march=native
on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw.
So -march=k8 will behave differently than -march=native on a K8
for example.
So remove this implicit setting from the frontend and instead
enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled.
Also enable PRFCHW flag on amdfam10/barcelona which seems to be
where this CPUID bit was introduced. That CPU also supported
3dnow.
This patch removes the PROC macro in favor of CPUKind enum and a
table that contains information about CPUs.
The current information in the table is the CPU name, CPUKind enum
value, key feature for target multiversioning, and Is64Bit capable.
For the strings that are aliases, I've duplicated the information
in the table. This means there are more rows in the table than
CPUKind enums.
This replaces multiple StringSwitch's with loops through the table.
They are linear searches due to the table being more logically
ordered than alphabetical. The StringSwitch's would have also been
linear. I've used StringLiteral on the strings in the table so we
can quickly check the length while searching.
I contemplated having a CPUKind for each string so there was a 1:1
mapping, but didn't want to spread more names to the places that
use the enum.
My ultimate goal here is to store the features for each CPU as a
bitset within the table. Hoping to use constexpr to make this
composable so we can group features and inherit them. After the
table lookup we can turn the bitset into a list of strings for the
frontend. The current switch we have for selecting features for
CPUs has become difficult to maintain while trying to express
inheritance relationships.
Differential Revision: https://reviews.llvm.org/D82414
This was orignally done so we could separate the compatibility
values and the llvm internal only features into a separate entries
in the feature array. This was needed when we explicitly had to
convert the feature into the proper 32-bit chunk at every reference
and we didn't want things moving around.
Now everything is in an array and we have helper funtions or macros
to convert encoding to index. So we renumbering is no longer an
issue.
Similar to what some other targets have done. This information
could be reused by other frontends so doesn't make sense to live
in clang.
-Rename CK_Generic to CK_None to better reflect its illegalness.
-Move function for translating from string to enum into llvm.
-Call checkCPUKind directly from the string to enum translation
and update CPU kind to CK_None accordinly. Caller will use CK_None
as sentinel for bad CPU.
I'm planning to move all the CPU to feature mapping out next. As
part of that I want to devise a better way to express CPUs inheriting
features from an earlier CPU. Allowing this to be expressed in a
less rigid way than just falling through a switch. Or using gotos
as we've had to do lately.
Differential Revision: https://reviews.llvm.org/D81439
Neither gcc or icc support this. Split out from D79472. I want
to remove more, but it looks like icc does support some things
gcc doesn't and I need to double check our internal test suites.
Y is the start of several 2 letter constraints, but we also had
partial support to recognize it by itself. But it doesn't look
like it can get through clang as a single letter so the backend
support for this was effectively dead.
gcc supports selecting ymm0/zmm0 for the Yz constraint when used with 256 or 512 bit vector types.
Fixes PR45806
Differential Revision: https://reviews.llvm.org/D79448
Summary:
This patch adds a virtual method `getCPUCacheLineSize()` to `TargetInfo`. Currently, I've only implemented the method in `X86TargetInfo`. It's extremely important that each CPU's cache line size correct (e.g., we can't just define it as `64` across the board) so, it has been a little slow getting to this point.
I'll work on the ARM CPUs next, but that will probably come later in a different patch.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74918
The validateOutputSize and validateInputSize need to check whether
AVX or AVX512 are enabled. But this can be affected by the
target attribute so we need to factor that in.
This patch moves some of the code from CodeGen to create an
appropriate feature map that we can pass to the function.
Differential Revision: https://reviews.llvm.org/D68627
The validateOutputSize and validateInputSize need to check whether
AVX or AVX512 are enabled. But this can be affected by the
target attribute so we need to factor that in.
This patch copies some of the code from CodeGen to create an
appropriate feature map that we can pass to the function. Probably
need some refactoring here to share more code with Codegen. Is
there a good place to do that? Also need to support the cpu_specific
attribute as well.
Differential Revision: https://reviews.llvm.org/D68627
All SSE capable CPUs have MMX. 3dnow implicitly enables MMX.
We have code that detects if sse is enabled and implicitly enables
MMX unless -mno-mmx is passed. So in most cases we were already
enabling MMX if march passed a CPU that supported SSE.
The exception to this is if you pass -march for a cpu supports SSE
and also pass -mno-sse. We should still enable MMX since its part
of the CPU capability.
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define
I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.
gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX
Differential Revision: https://reviews.llvm.org/D66669
llvm-svn: 370393
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D65840
llvm-svn: 368543
The recently added cooperlake CPU has made our already ugly switch statement even worse. There's a CPU exclusion list around the bf16 feature in the cooper lake block. I worry that we'll have to keep adding new CPUs to that until bf16 intercepts a client space CPU. We have several other exclusion lists in other parts of the switch due to skylakeserver, cascadelake, and cooperlake not having sgx. Another for cannonlake not having clwb but having all other features from skx.
This removes all these special ifs at the cost of some duplication of features and a goto. I've copied all of the skx features into either cannonlake or icelakeclient(for clwb). And pulled sklyakeserver, cascadelake, and cooperlake out of the main inheritance chain into their own chain. At the end of skylakeserver we merge back into the main chain at skylakeclient but below sgx. I think this is at least easier to follow.
Differential Revision: https://reviews.llvm.org/D63018
llvm-svn: 362965
We had an if statement that checked over every avx512* feature to see if it should enabled avx512f. Since they are all prefixed with avx512 just check for that instead.
llvm-svn: 361557
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.
After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.
llvm-svn: 360924
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Patch by LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60552
llvm-svn: 360018
Use the new cx8 feature flag that was added to the backend to represent support for cmpxchg8b. Use this flag to set the MaxAtomicInlineWidth.
This also assumes all the cmpxchg instructions are enabled for CK_Generic which is what cc1 defaults to when nothing is specified.
Differential Revision: https://reviews.llvm.org/D59566
llvm-svn: 356709
PentiumPro has HasNOPL set in the backend. i686 does not.
Despite having a function that looks like it canonicalizes alias names. It
doesn't seem to be called. So I don't think this is a functional change. But its
good to be consistent between the backend and frontend.
llvm-svn: 356537
Summary:
This define should correspond to CMPXCHG16B being available which requires 64-bit mode.
I checked and gcc also seems to only define this in 64-bit mode.
Reviewers: RKSimon, spatel, efriedma, jyknight, jfb
Reviewed By: jfb
Subscribers: jfb, cfe-commits, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59287
llvm-svn: 356118
This patch enables the following
1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) This patch is the clang counterpart to D58343
Reviewers: craig.topper
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58344
llvm-svn: 354899
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This is skylake-avx512 with the addition of avx512vnni ISA.
Patch by Jianping Chen
Differential Revision: https://reviews.llvm.org/D54792
llvm-svn: 347682
This silences a -Wimplicit-fallthrough warning from clang. GCC does not
appear to warn when the case body ends in a switch.
This is a somewhat surprising but intended fallthrough that I pulled out
from my mechanical patch. The code intends to handle 'Yi' and related
constraints as the 'x' constraint.
llvm-svn: 345873
I'm unsure if KNL has this feature, but the backend never thought it did, only clang did. The predefined-arch-macros test lost the check for __RTM__ on KNL when it was removed Skylake CPUs in r344117.
I think we want to drop it from KNL for consistency with Skylake anyway regardless of how we got here.
llvm-svn: 344978
Summary:
gcc defines macros such as __code_model_small_ based on the user passed command line flag -mcmodel. clang accepts a flag with the same name and similar effects, but does not generate any macro that the user can use. This cl narrows the gap between gcc and clang behaviour.
However, achieving full compatibility with gcc is not trivial: The set of valid values for mcmodel in gcc and clang are not equal. Also, gcc defines different macros for different architectures. In this cl, we only tackle an easy part of the problem and define the macro only for x64 architecture. When the user does not specify a mcmodel, the macro for small code model is produced, as is the case with gcc.
Reviewers: compnerd, MaskRay
Reviewed By: MaskRay
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D52920
llvm-svn: 344000