llvm-project

Author	SHA1	Message	Date
Benjamin Kramer	91487b2481	[X86][Disassembler][NFCI] Read bytes with support::endian::read	2023-01-08 18:19:49 +01:00
Benjamin Kramer	b6942a2880	[NFC] Hide implementation details in anonymous namespaces	2023-01-08 17:37:02 +01:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
Nikita Popov	e3c2faa64a	Revert "[X86] Revert -fno-plt __tls_get_addr workaround for old GNU ld" This reverts commit 2679e8bba3e166e3174971d040b9457ec7b7d768. This change is a significant backwards-compatibility break, which does in fact break the entire Rust ecosystem, which uses an -fno-plt -mrelax-relocations=0 default. Please go through pre-commit review for this change in order to gain broader consensus.	2023-01-06 09:43:47 +01:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Freddy Ye	27b8f54f51	[X86] Support -march=emeraldrapids Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D140950	2023-01-05 20:27:32 +08:00
Roman Lebedev	dbce1110f1	[NFC][DAG] Move `getOpcode_EXTEND*()` helpers from X86 into SelectionDAG To be used in an upcoming patch.	2023-01-05 01:12:30 +03:00
Roman Lebedev	e4b260efb2	[Codegen][X86] `LowerBUILD_VECTOR()`: improve lowering w/ multiple FREEZE-UNDEF ops While we have great handling for UNDEF operands, FREEZE-UNDEF operands are effectively normal operands. We are better off "interleaving" such BUILD_VECTORS into a blend between a splat of FREEZE-UNDEF, and "thawed" source BUILD_VECTOR, both of which are more natural for us to handle. Refs. `f738ab9075 (r95017306)`	2023-01-04 21:16:11 +03:00
Jay Foad	6f7ff9b933	[MC] Consistently use MCInstrDesc::getImplicitUses and getImplicitDefs. NFC.	2023-01-04 13:16:12 +00:00
Fangrui Song	2679e8bba3	[X86] Revert -fno-plt __tls_get_addr workaround for old GNU ld ENABLE_X86_RELAX_RELOCATIONS has defaulted to on in 2020. This workaround is not exercised for a long time.	2022-12-31 22:39:20 -08:00
Thomas Köppe	82be8a1d2b	[X86] Emit RIP-relative access to local function in PIC medium code model Currently, the medium code model for x86_64 emits position-dependent relocations (R_X86_64_64) for local functions, regardless of PIC or no-PIC mode. (This means generically that code compiled with the medium model cannot be linked into a position-independent executable.) Example: ``` static int g(int n) { return 2 * n + 3; } void f(int(*p)(int)) { p = g; } ``` This results in: ``` Disassembly of section .text: 0000000000000000 <f>: 0: 48 b8 00 00 00 00 00 00 00 00 movabs rax, 0x0 a: 48 89 07 mov qword ptr [rdi], rax d: c3 ret ``` ``` Relocation section '.rela.text' at offset 0xf0 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000000002 0000000200000001 R_X86_64_64 0000000000000000 .text + 10 ``` This patch changes the behaviour to unconditionally emit a RIP-relative access, both in PIC and non-PIC mode. This fixes PIC mode, and is perhaps an improvement in non-PIC mode, too, since it results in a shorter instruction. A 32-bit relocation should suffice since the medium memory model demands that all code fit within 2GiB. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D140593	2022-12-28 11:14:39 -08:00
Fangrui Song	69243cdb92	Remove incorrectly implemented -mibt-seal The option from D116070 does not work as intended and will not be needed when hidden visibility is used. A function needs ENDBR if it may be reached indirectly. If we make ThinLTO combine the address-taken property (close to `!GV.use_empty() && !GV.hasAtLeastLocalUnnamedAddr()`), then the condition can be expressed with: `AddressTaken \|\| (!F.hasLocalLinkage() && (VisibleToRegularObj \|\| !F.hasHiddenVisibility()))` The current `F.hasAddressTaken()` condition does not take into acount of address-significance in another bitcode file or ELF relocatable file. For the Linux kernel, it uses relocatable linking. lld/ELF uses a conservative approach by setting all `VisibleToRegularObj` to true. Using the non-relocatable semantics may under-estimate `VisibleToRegularObj`. As @pcc mentioned on https://github.com/ClangBuiltLinux/linux/issues/1737#issuecomment-1343414686 , we probably need a symbol list to supply additional `VisibleToRegularObj` symbols (not part of the relocatable LTO link). Reviewed By: samitolvanen Differential Revision: https://reviews.llvm.org/D140363	2022-12-22 12:32:59 -08:00
Evgenii Kudriashov	15dd5ed96c	[X86] Support ANDNP combine through vector_shuffle Combine ``` and (vector_shuffle<Z,...,Z> (insert_vector_elt undef, (xor X, -1), Z), undef), Y -> andnp (vector_shuffle<Z,...,Z> (insert_vector_elt undef, X, Z), undef), Y ``` Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D138521	2022-12-22 16:55:14 +08:00
Matt Arsenault	69e75ae695	CodeGen: Don't lazily construct MachineFunctionInfo This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered. I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself. A missing feature I still could use is a way to access an custom analysis pass on the IR here.	2022-12-21 10:49:32 -05:00
Craig Topper	eeb8de9363	[X86] Replace getOperand calls with an existing variable. NFC	2022-12-20 19:27:11 -08:00
Roman Lebedev	1cbcd8ad20	[X86] avx512fp16: add missing instruction selection patterns for "i16" `VMOVSH` For all other patterns, we consistently have both I and F variants, let's not diverge. Fixes https://github.com/llvm/llvm-project/issues/59628	2022-12-21 05:17:02 +03:00
Nick Desaulniers	be8fd64091	[llvm][X86ISelDAGToDAG] support -{start\|stop}-{before\|after}=x86-isel Follow a similar pattern as AMDGPUDAGToDAGISel's constructor so that we can use INITIALIZE_PASS to register a pass. This allows for more fine grain testability of SelectionDAGISel. Link: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D140323	2022-12-20 14:16:45 -08:00
Nick Desaulniers	ad99774a5f	[llvm][PassSupport] don't require passes to be default constructible Quite a few passes are not default constructible. In order to properly support -{start\|stop}-{before\|after}= for these passes, we would like to continue to use INITIALIZE_PASS, but not necessarily provide a default constructor. Delete the default constructors of classes derived from SelectionDAGISel. Link: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D140349	2022-12-20 14:07:29 -08:00
Simon Pilgrim	e16b4f5b16	[X86] Fix SLM uops/resources counts for CMPXCHG instructions LOCK + CMPXCHG8/CMPXCHG16 variants still need overriding as they are not completely correct - already much better though Based off llvm-exegesis captures, confirmed with Agner + uops.info	2022-12-20 13:07:03 +00:00
Archibald Elliott	f09cf34d00	[Support] Move TargetParsers to new component This is a fairly large changeset, but it can be broken into a few pieces: - `llvm/Support/TargetParser` are all moved from the LLVM Support component into a new LLVM Component called "TargetParser". This potentially enables using tablegen to maintain this information, as is shown in https://reviews.llvm.org/D137517. This cannot currently be done, as llvm-tblgen relies on LLVM's Support component. - This also moves two files from Support which use and depend on information in the TargetParser: - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting the current Host machine for info about it, primarily to support getting the host triple, but also for `-mcpu=native` support in e.g. Clang. This is fairly tightly intertwined with the information in `X86TargetParser.h`, so keeping them in the same component makes sense. - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains the target triple parser and representation. This is very intertwined with the Arm target parser, because the arm architecture version appears in canonical triples on arm platforms. - I moved the relevant unittests to their own directory. And so, we end up with a single component that has all the information about the following, which to me seems like a unified component: - Triples that LLVM Knows about - Architecture names and CPUs that LLVM knows about - CPU detection logic for LLVM Given this, I have also moved `RISCVISAInfo.h` into this component, as it seems to me to be part of that same set of functionality. If you get link errors in your components after this patch, you likely need to add TargetParser into LLVM_LINK_COMPONENTS in CMake. Differential Revision: https://reviews.llvm.org/D137838	2022-12-20 11:05:50 +00:00
Simon Pilgrim	e5abaf8dec	[X86] Fix SLM uops counts for WriteBitTestSetRegRMW instructions The set/reset/complement RMW variants use +1uop compared to the BT read-only instructions Based off llvm-exegesis captures, confirmed with Agner + uops.info	2022-12-19 18:21:31 +00:00
Simon Pilgrim	c39c2cc954	[X86] Fix SLM uops counts for AES instructions Based off llvm-exegesis captures, confirmed with uops.info	2022-12-19 11:03:41 +00:00
Simon Pilgrim	e7bd805805	[X86] Add default LoadUOps argument to Intel models WriteResPair macro This will make it easier to override the folded uop count on a class-by-class basis	2022-12-19 10:44:48 +00:00
Qiu Chaofan	a40ef656d8	[Intrinsic] Rename flt.rounds intrinsic to get.rounding Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding intrinsic to replace flt.rounds. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D139507	2022-12-19 15:22:39 +08:00
Sergei Barannikov	4d48ccfc88	[MC] Use `MCRegister` instead of `unsigned` in `MCTargetAsmParser` Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D140273	2022-12-18 12:12:05 -08:00
Simon Pilgrim	bbf84fcf18	[X86] SandyBridge - fix ADC RMW uop count These should consistently use the fused domain count, not the unfused domain Confirmed with Agner + uops.info	2022-12-17 21:52:44 +00:00
Simon Pilgrim	ed37234f9b	[X86] Fix BMI uop/throughputs on znver1/znver2 Most BMI ops are 2uop and 0.5 throughput - interestingly TZCNTrm doesn't take an extra uop but the other instructions do Confirmed by AMD SoG + Agner	2022-12-17 20:38:40 +00:00
Simon Pilgrim	2bc2bcb246	[X86] All the WriteBLS instructions take 2uops, not 1uop Confirmed by AMD SoG + Agner + uops.info	2022-12-17 15:40:41 +00:00
Simon Pilgrim	2ee17d691f	[llvm-exegesis][X86] Use the same AGU counter estimate mapping for znver1 as znver2, and count RMW ops as well znver2 can use the ld/st dispatch counters to make a reasonable estimate for the AGU usage (although it misses complex LEA ops which I don't think we can fix), although it wasn't accounting for RMW ld-st uops which are counted separately - the same approach can be used for znver1 (ymm double-pumping ld/st agu is correctly measured as 2uops) This change is mainly academic, but was noticed as the znver1/2 models incorrectly assume scalar RMW ops take 2uops	2022-12-17 14:06:31 +00:00
Ganesh Gopalasubramanian	1f057e365f	[X86] AMD Zen 4 Initial enablement Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D139073	2022-12-17 16:15:22 +05:30
Christudasan Devadasan	b5efec4b27	[CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot With D134950, targets get notified when a virtual register is created and/or cloned. Targets can do the needful with the delegate callback. AMDGPU propagates the virtual register flags maintained in the target file itself. They are useful to identify a certain type of machine operands while inserting spill stores and reloads. Since RegAllocFast spills the physical register itself, there is no way its virtual register can be mapped back to retrieve the flags. It can be solved by passing the virtual register as an additional argument. This argument has no use when the spill interfaces are called during the greedy allocator or even the PrologEpilogInserter and can pass a null register in such cases. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138656	2022-12-17 11:55:34 +05:30
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Craig Topper	c09edce1b3	[SelectionDAG] Give all the target specific subclasses of SelectionDAGISel their own pass ID. Previously we had a shared ID in SelectionDAGISel. AMDGPU has an initializePass function for its subclass of SelectionDAGISel. No other target does. This causes all target specific SelectionDAGISel passes to be known as "amdgpu-isel". I'm not sure what would happen if another target tried to implement an initializePass function too since the ID is already claimed. This patch gives all targets their own ID and passes it down to SelectionDAGISel constructor to MachineFunctionPass's constructor. Unfortunately, I think this causes most targets to lose print-before/after-all support for their SelectionDAGISel pass. And they probably no longer support start/stop-before/after. We can add initializePass functions to fix this as a follow up. NOTE: This was probably also broken if the AMDGPU target isn't compiled in. Step 1 to fixing PR59538. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D140161	2022-12-15 15:48:55 -08:00
Simon Pilgrim	37c3b83bd8	[X86] combineBitcastvxi1 - handle boolmask sign-extension through vselect See if we can freely sign-extend both sources of a vselect operand, also handle allones constant build vectors (easily rematerializable and uses in the test case). Fixes #59526	2022-12-15 16:40:44 +00:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Simon Pilgrim	463910ab2a	[X86] Don't fold scalar_to_vector(i64 C) -> vzext_movl(scalar_to_vector(i32 C)) Fixes constant-folding infinite loop reported by @uabelho on rG5ca77541446d	2022-12-14 12:11:06 +00:00
Simon Pilgrim	4f41ea2016	[X86] lowerShuffleAsVTRUNC - bit shift the offset elements into place instead of shuffle This helps avoid issues on non-BWI targets which can end up splitting the shuffles to 2 x 256-bit bitshifts of a smaller scalar width	2022-12-14 11:41:14 +00:00
Simon Pilgrim	b3eaf40166	[X86] lowerShuffleAsVTRUNC - improve detection of cheap/free vector concatenation Handle the case where the lo/hi subvectors are a split load.	2022-12-14 10:49:44 +00:00
Josh Stone	9b8fcd04ef	[X86] Fix cmp order in probing BuildStackAlignAND Due to reversed arguments, the loop start was almost always skipping the whole loop, since FinalStackProbed is probably less than StackPtr for large alignments. The intent was to skip the loop if the first sub on StackPtr made it less than FinalStackProbed already, so flip it. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D139756	2022-12-13 12:10:39 -08:00
Roman Lebedev	64d46e141c	[NFC][Costmodel][X86] Replication shuffle: AVX512F can promote i1 to i32. As the added codegen test coverage shows, there isn't that much difference between AVX512DQI and baseline AVX512F codegen, DQI added `vpmovm2d`/`vpmovd2m`, but with just the Foundation we can use `vpternlogd`/`vptestmd` to do the same.	2022-12-13 21:21:07 +03:00
Roman Lebedev	ff5fcda430	[x86][Costmodel] AVX512VL: add missing costs for v8 i1<->i32 casts This would come up as a regression in the follow-up Replication-of-i1 patch. https://godbolt.org/z/fxr9Mzssr	2022-12-13 21:21:07 +03:00
Phoebe Wang	57f71dccd3	[NFC] Fix duplicated `Src`	2022-12-13 22:44:28 +08:00
Simon Pilgrim	4177e6cd4f	[X86] lowerShuffleAsVTRUNC - support offseted truncations Extend the <0,Scale,2Scale,..> pattern to allow for a fixed offset <Offset,Offset+Scale,Offset+2Scale,..> pattern, which will lower to a single additional bitshift/pshufd. At the moment I've limited this to cases where the LHS/RHS operands are concatenated for free, but this is only to avoid a couple of regressions that should be easily addressable in followups.	2022-12-13 14:00:35 +00:00
Simon Pilgrim	f6a96bee51	[X86] X86TTIImpl::getIntImmCost - use APInt::isInt/isSignedInt directly Avoid some getSExtValue()/getZExtValue() calls Hopefully we can remove some of the getBitWidth() constraints as well, as many are just there as a proxy for legal types (albeit assuming x86_64).	2022-12-12 15:32:49 +00:00
Simon Pilgrim	00a2d6e23d	[llvm-exegesis][X86] Add memory pipe counters to SLM model There might not be any exposed alu pipe counters for us to measure - but the sum of load/store uop counters seems to give a really good approximation to memory controller usage - even for more complex instructions like cmpxchg	2022-12-12 12:09:11 +00:00
Simon Pilgrim	ba237cb268	Revert rG6a0bbb84cef28ed642a730e55c52447b8c870647 "[X86] RDRAND is a Goldmont feature, not Silvermont" RDRAND is a Silvermont feature - confirmed with CPUID	2022-12-11 18:19:31 +00:00
Roman Lebedev	680b33b66e	[X86] AMD Zen 3 sched model: FMA ops have inverse throughput of 0.5 Now that exegesis produces meaningful snippets to measure throughtput for instructions with tied operands: `2ffe225d11` the measurements clearly show these instructions to have more optimistic throughtput. There's still some noise in the reports, especially around instructions with memory operands. I'm not sure if we measure those correctly. Fixes https://github.com/llvm/llvm-project/issues/59325	2022-12-11 21:12:55 +03:00
Simon Pilgrim	6a0bbb84ce	[X86] RDRAND is a Goldmont feature, not Silvermont	2022-12-11 12:28:22 +00:00
Simon Pilgrim	b3c7e43d04	[X86] Fix missing HasPRFCHW predicate This was declared in FeaturePRFCHW but never defined. Noticed while preparing to add Unsupported features handling to X86 scheduler models.	2022-12-11 11:06:10 +00:00
Simon Pilgrim	d75980f807	[X86] Fix missing HasX86_64 predicate This was declared in FeatureX86_64 but never defined (we use the *64BitMode predicates for instruction defs - but now we need it for scheduler model defs). Noticed while preparing to add Unsupported features handling to X86 scheduler models.	2022-12-11 10:27:03 +00:00

1 2 3 4 5 ...

23116 Commits