llvm-project

Author	SHA1	Message	Date
Phoebe Wang	24194090e1	[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features This is an alternative of D157485 and a pre-feature to support AVX10. AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267 AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343 RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661 Based on the feedbacks from LLVM and GCC community, we have agreed to start from supporting `-m[no-]evex512` on existing AVX512 features. The option `-mno-evex512` can be used with `-mavx512xxx` to build binaries that can run on both legacy AVX512 targets and AVX10-256. There're still arguments about what's the expected behavior when this option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We decided to defer the support of `-mavx10.1` after we made consensus. Or furthermore, we start from supporting AVX10.2 and not providing any AVX10.1 options. Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D159250	2023-09-08 22:47:22 +08:00
Phoebe Wang	0856efbf88	Revert "[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features" This reverts commit 7dd48cc24de2d54d40527432cbee8a9d97a8a4f7. Causing buildbot failure.	2023-09-07 21:59:01 +08:00
Phoebe Wang	7dd48cc24d	[X86][RFC] Add new option `-m[no-]evex512` to disable ZMM and 64-bit mask instructions for AVX512 features This is an alternative of D157485 and a pre-feature to support AVX10. AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267 AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343 RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661 Based on the feedbacks from LLVM and GCC community, we have agreed to start from supporting `-m[no-]evex512` on existing AVX512 features. The option `-mno-evex512` can be used with `-mavx512xxx` to build binaries that can run on both legacy AVX512 targets and AVX10-256. There're still arguments about what's the expected behavior when this option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We decided to defer the support of `-mavx10.1` after we made consensus. Or furthermore, we start from supporting AVX10.2 and not providing any AVX10.1 options. Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D159250	2023-09-07 21:38:35 +08:00
Fangrui Song	bf6e39367c	[X86] Clean up GlobalISel headers. NFC	2023-08-21 23:55:08 -07:00
Noah Goldstein	cecaf29589	Adding tuning flags for int <-> fp domain switching penalties; NFC Atom - No domain switching penalties Nehalem+ - No penalty on moves Haswell+ - No penalty on moves / shuffles Skylake+ - No penality on moves / shuffles / blends Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D143859	2023-02-27 18:53:25 -06:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Ilya Tokar	d7043e8c41	[X86] Add support for "light" AVX AVX/AVX512 instructions may cause frequency drop on e.g. Skylake. The magnitude of frequency/performance drop depends on instruction (multiplication vs load/store) and vector width. Currently users, that want to avoid this drop can specify -mprefer-vector-width=128. However this also prevents generations of 256-bit wide instructions, that have no associated frequency drop (mainly load/stores). Add a tuning flag that allows generations of 256-bit AVX load/stores, even when -mprefer-vector-width=128 is set, to speed-up memcpy&co. Verified that running memcpy loop on all cores has no frequency impact and zero CORE_POWER:LVL[12]_TURBO_LICENSE perf counters. Makes coping memory faster e.g.: BM_memcpy_aligned/256 80.7GB/s ± 3% 96.3GB/s ± 9% +19.33% (p=0.000 n=9+9) Differential Revision: https://reviews.llvm.org/D134982	2023-01-24 17:02:46 -05:00
Simon Pilgrim	527e453a5b	[X86] Add HasCLFLUSH pseudo-predicate (Issue #19039 ) Similar to what we've done for HasMFence - this puts into place a pseudo-predicate for CLFLUSH instructions that separates it from HasSSE2 to make it easier to use CLFLUSH even when SSE/fpmath has been disabled - technically CLFLUSH has its own CPUID bit, so could be available on x86 cores entirely without SSE, but I don't think thats ever happened or likely to happen.	2022-12-08 13:51:14 +00:00
Phoebe Wang	62ca79102c	[X86][1/2] Support PREFETCHI instructions For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D136040	2022-10-20 08:46:01 +08:00
Benjamin Kramer	08dc847f33	Add missing `override`s after aad013de41c0	2022-10-14 10:38:32 +02:00
Paul Robinson	77b00098f2	[PS5] Use same debug trap instruction as PS4	2022-06-16 11:03:03 -07:00
Shengchen Kan	e58dadf3e2	[X86][NFC] Generate fields and getters for subtarget features Non-duplicated comments are moved from X86Subtarget.h to X86.td. This is a follow-up patch for D120906.	2022-03-20 15:27:21 +08:00
Shengchen Kan	ae0ae91903	[X86][NFC] Remove unused variable UseAA	2022-03-20 13:21:25 +08:00
Shengchen Kan	c266776429	[X86][NFC] Remove unused feature UseAA	2022-03-20 13:14:13 +08:00
Shengchen Kan	076a9dc99a	[X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF() To make them less like other feature functions. This is a follow-up patch for D121978.	2022-03-20 12:00:25 +08:00
Craig Topper	57b41af838	[X86] Rename FeatureCMPXCHG8B/FeatureCMPXCHG16B to FeatureCX8/CX16 to match CPUID. Rename hasCMPXCHG16B() to canUseCMPXCHG16B() to make it less like other feature functions. Add a similar canUseCMPXCHG8B() that aliases hasCX8() to keep similar naming. Differential Revision: https://reviews.llvm.org/D121978	2022-03-19 12:34:06 -07:00
Shengchen Kan	920c2e5763	[X86][NFC] Rename target feature hasCMov->hasCMOV This is a follow-up patch for D121975.	2022-03-18 14:05:52 +08:00
Craig Topper	6cfe41dcc8	[X86] Rename more target feature related things consistency. NFC -Rename ModeBit to IsBit to match X86Subtarget. -Rename FeatureLAHFSAHF to FeatureLAFHSAFH64 to match X86Subtarget. -Use consistent capitalization Reviewed By: skan Differential Revision: https://reviews.llvm.org/D121975	2022-03-17 22:27:17 -07:00
Shengchen Kan	052d37dc7c	[NFC][X86] Rename some variables and functions about target features This is preparation for D121768. The member's name should align w/ the interface for trival target feature.	2022-03-16 13:08:52 +08:00
Mircea Trofin	294eca35a0	[regalloc] Remove -consider-local-interval-cost Discussed extensively on D98232. The functionality introduced in D35816 never worked correctly. In D98232, it was fixed, but, as it was introducing a large compile-time regression, and the value of the original patch was called into doubt, we disabled it by default everywhere. A year later, it appears that caused no grief, so it seems safe to remove the disabled code. This should be accompanied by re-opening bug 26810. Differential Revision: https://reviews.llvm.org/D121128	2022-03-14 10:49:16 -07:00
Paul Robinson	7b85f0f32f	[PS4] isPS4 and isPS4CPU are not meaningfully different	2022-03-03 11:36:59 -05:00
Sanjay Patel	40a50f8701	[x86] avoid false dependency stall on 'sbb' with same source reg This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs. Differential Revision: https://reviews.llvm.org/D118843	2022-02-07 10:12:12 -05:00
James Farrell	219672b8dd	Revert "Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."" This reverts commit 63a6348cad6caccf285c1661bc60d8ba5a40c972. Differential Revision: https://reviews.llvm.org/D115254	2021-12-07 23:15:21 +00:00
James Farrell	63a6348cad	Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible." This reverts commit 50324670342d9391f62671685f4d6b4880a4ea9a.	2021-12-06 17:35:26 +00:00
James Farrell	5032467034	Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible. This reverts commit 40d5eeac6cd89a2360c3ba997cbaa816abca828c. Differential Revision: https://reviews.llvm.org/D114885	2021-12-06 14:57:47 +00:00
Nikita Popov	40d5eeac6c	Revert "Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible." This reverts commit 1e8286467036d8ef1a972de723f805a4981b2692. llvm/test/Transforms/LoopStrengthReduce/X86/2009-11-10-LSRCrash.ll fails with assertion failure: llc: /home/nikic/llvm-project/llvm/include/llvm/ADT/Optional.h:196: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = unsigned int]: Assertion `hasVal' failed. ... #8 0x00005633843af5cb llvm::MCStreamer::emitVersionForTarget(llvm::Triple const&, llvm::VersionTuple const&) #9 0x0000563383b47f14 llvm::AsmPrinter::doInitialization(llvm::Module&)	2021-11-30 18:36:32 +01:00
James Farrell	1e82864670	Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible. See also https://github.com/android/ndk/issues/1455. Differential Revision: https://reviews.llvm.org/D114163	2021-11-30 15:44:23 +00:00
Simon Pilgrim	9fc523d114	[X86] Remove X86ProcFamilyEnum::IntelSLM Replace X86ProcFamilyEnum::IntelSLM enum with a TuningUseSLMArithCosts flag instead, matching what we already do for Goldmont. This just leaves X86ProcFamilyEnum::IntelAtom to replace with general Tuning/Feature flags and we can finally get rid of the old X86ProcFamilyEnum enum. Differential Revision: https://reviews.llvm.org/D112079	2021-10-20 11:58:39 +01:00
Matt Morehouse	431a5d8411	[x86] Implement a tagged-globals backend feature. The feature tells the backend to allow tags in the upper bits of global variable addresses. These tags will be ignored by upcoming CPUs with the Intel LAM feature but may be used in instrumentation passes (e.g., HWASan). This patch implements the feature by using @GOTPCREL relocations instead of direct references to the locally defined global. Thus the full tagged address can be loaded by a single instruction: movq global@GOTPCREL(%rip), %rax Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D111343	2021-10-18 13:31:10 -07:00
Tim Northover	5d070c8259	SwiftAsync: use runtime-provided flag for extended frame if back-deploying When back-deploying Swift async code we can't always toggle the flag showing an extended frame is present because it will confuse unwinders on systems released before this feature. So in cases where the code might run there, we `or` in a mask provided by the runtime (as an absolute symbol) telling us whether the unwinders can cope. When deploying only for newer OSs, we can still hard-code the bit-set for greater efficiency.	2021-09-13 13:54:46 +01:00
Tianqing Wang	12fa608af4	[X86] Add CRC32 feature. d8faf03807ac implemented general-regs-only for X86 by disabling all features with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this instruction and allows it to be used with general-regs-only. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105462	2021-09-06 17:24:30 +08:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Harald van Dijk	75521bd9d8	[X32] Add Triple::isX32(), use it. So far, support for x86_64-linux-gnux32 has been handled by explicit comparisons of Triple.getEnvironment() to GNUX32. This worked as long as x86_64-linux-gnux32 was the only X32 environment to worry about, but we now have x86_64-linux-muslx32 as well. To support this, this change adds an isX32() function and uses it. It replaces all checks for GNUX32 or MuslX32 by isX32(), except for the following: - Triple::isGNUEnvironment() and Triple::isMusl() are supposed to treat GNUX32 and MuslX32 differently. - computeTargetTriple() needs to be able to transform triples to add or remove X32 from the environment and needs to map GNU to GNUX32, and Musl to MuslX32. - getMultiarchTriple() completely lacks any Musl support and retains the explicit check for GNUX32 as it can only return x86_64-linux-gnux32. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D103777	2021-06-07 20:48:39 +01:00
Roman Lebedev	cf9b1f7a0e	[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature, which controls profitability of both the cross-lane and per-lane variable shuffles. I guess, this has been fine so far. But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`) are as fast as as shuffles with fixed/immediate mask, while lane-crossing shuffles, e.g. `VPERMPS` is performing worse. So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles, as suggested by @RKSimon, split the feature flag into two. Differential Revision: https://reviews.llvm.org/D103274	2021-06-01 10:39:36 +03:00
Tim Northover	82a0e808bb	IR/AArch64/X86: add "swifttailcc" calling convention. Swift's new concurrency features are going to require guaranteed tail calls so that they don't consume excessive amounts of stack space. This would normally mean "tailcc", but there are also Swift-specific ABI desires that don't naturally go along with "tailcc" so this adds another calling convention that's the combination of "swiftcc" and "tailcc". Support is added for AArch64 and X86 for now.	2021-05-17 10:48:34 +01:00
Roman Lebedev	b1c38207e9	[X86] Improve costmodel for scalar byte swaps Currently we model i16 bswap as very high cost (`10`), which doesn't seem right, with all other being at `1`. Regardless of `MOVBE`, i16 reg-reg bswap is lowered into (an extending move plus) rot-by-8: https://godbolt.org/z/8jrq7fMTj I think it should at worst have throughput of `1`: Since i32/i64 already have cost of `1`, `MOVBE` doesn't improve their costs any further. BUT, `MOVBE` must have at least a single memory operand, with other being a register. Which means, if we have a bswap of load, iff load has a single use, we'll fold bswap into load. Likewise, if we have store of a bswap, iff bswap has a single use, we'll fold bswap into store. So i think we should treat such a bswap as free, unless of course we know that for the particular CPU they are performing badly. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D101924	2021-05-08 15:17:35 +03:00
Mircea Trofin	ce61def529	[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges. The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism). Without the change in RegAllocGreedy.cpp, the compiler ices. This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating. I will follow up with a subsequent patch to improve the usability and maintainability of Query. Differential Revision: https://reviews.llvm.org/D98232	2021-04-01 08:33:28 -07:00
Luo, Yuanke	f80b29878b	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00
Liu, Chen3	756f597841	[X86] Support Intel avxvnni This patch mainly made the following changes: 1. Support AVX-VNNI instructions; 2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix. Differential Revision: https://reviews.llvm.org/D89105	2020-10-31 12:39:51 +08:00
Tianqing Wang	be39a6fe6f	[X86] Add User Interrupts(UINTR) instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301	2020-10-22 17:33:07 +08:00
Wang, Pengfei	412cdcf2ed	[X86] Add HRESET instruction. For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89102	2020-10-13 08:47:26 +08:00
Xiang1 Zhang	413577a879	[X86] Support Intel Key Locker Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access to the raw key value by converting AES keys into “handles”. These handles can be used to perform the same encryption and decryption operations as the original AES keys, but they only work on the current system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot), then any previous handles can no longer be used. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D88398	2020-09-30 18:08:45 +08:00
Hiroshi Yamauchi	28ccc52c40	[X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer. Differential Revision: https://reviews.llvm.org/D85989	2020-08-19 13:39:42 -07:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Roman Lebedev	e1dd212c87	[X86] Remove disabled miscompiling X86CondBrFolding pass As briefly discussed in IRC with @craig.topper, the pass is disabled basically since it's original introduction (nov 2018) due to known correctness issues (miscompilations), and there hasn't been much work done to fix that. While i won't promise that i will "fix" the pass, i have looked at it previously, and i'm sure i won't try to fix it if that requires actually fixing this existing code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D84775	2020-07-28 23:35:04 +03:00
Craig Topper	1a75d88b3e	[X86] Move getGatherOverhead/getScatterOverhead into X86TargetTransformInfo. These cost methods don't make much sense in X86Subtarget. Make them methods in X86's TTI and move the feature checks from the X86Subtarget constructor into these methods. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D84594	2020-07-26 10:38:42 -07:00
Craig Topper	14c59b4577	[X86] Remove getProcFamily() method from X86Subtarget. NFC This isn't used and we've decided in the past that a CPU enum for tuning is not a good idea.	2020-07-25 22:11:45 -07:00
Craig Topper	8158f0cefe	[X86] Use X86_MC::ParseX86Triple to add mode features to feature string in X86Subtarget::initSubtargetFeatures. Remove mode flags from constructor and remove calls to ToggleFeature for the mode bits. By adding them to the feature string we handle initializing the mode member variables in X86Subtarget and the feature bits in MCSubtargetInfo in one shot.	2020-07-24 10:48:22 -07:00
Craig Topper	205e8b7e89	[X86] Make the X86ProcFamilyEnum private to X86Subtarget. Removed unneeded 'protected' from X86Subtarget. NFC	2020-07-23 23:42:11 -07:00
Craig Topper	ebe5f17f9c	[X86] Remove the DeprecatedMPX feature flag. We deprecated mpx feature in 10.0. I left this feature flag in case someone still had IR files containing the feature in a target-feature attribute. At the time I think I thought it would fail the test if the feature couldn't be found. Further review suggests that at worst it prints a message to stderr about ignoring the feature.	2020-07-22 17:44:07 -07:00

1 2 3 4 5 ...

480 Commits