llvm-project

Author	SHA1	Message	Date
Kazu Hirata	1daf2994de	[llvm] Use StringRef::contains (NFC)	2023-12-23 22:21:52 -08:00
Shengchen Kan	17ff25a58e	[X86][NFC] Not infer OpSize from Xi8\|16\|32\|64 For legacy (arithmetic) instructions, the operand size override prefix (0x66) is used to switch the operand data size from 32b to 16b (in 32/64-bit mode), 16b to 32b (in 16-bit mode). That's why we set OpSize16 for 16-bit instructions and set OpSize32 for 32-bit instructions. But it's not a generic rule any more after APX. APX adds 4 variants for arithmetic instructions: promoted EVEX, NDD (new data destination), NF (no flag), NF_NDD. All the 4 variants are in EVEX space and only legal in 64-bit mode. EVEX.pp is set to 01 for the 16-bit instructions to encode 0x66. For APX, we should set OpSizeFixed for 8/16/32/64-bit variants and set PD for the 16-bit variants. Hence, to reuse the classes ITy and its subclasses BinOp* for APX instructions, we extract the OpSize setting from the class ITy.	2023-12-24 12:00:25 +08:00
Shengchen Kan	6e20df1a3b	[X86][NFC] Set default OpPrefix to PS for XOP/VEX/EVEX instructions It helps simplify the class definitions. Now, the only explicit usage of PS is to check prefix 0x66/0xf2/0xf3 can not be used a prefix, e.g. wbinvd. See 82974e0114f02ffc07557e217d87f8dc4e100a26 for more details.	2023-12-24 10:20:40 +08:00
Felipe de Azevedo Piovezan	acacec3bbf	[LiveDebugValues][nfc] Reduce memory usage of InstrRef (#76051 ) Commit 1b531d54f623 (#74203) removed the usage of unique_ptrs of arrays in favour of using vectors, but inadvertently increased peak memory usage by removing the ability to deallocate vector memory that was no longer needed mid-LDV. In that same review, it was pointed out that `FuncValueTable` typedef could be removed, since it was "just a vector". This commit addresses both issues by making `FuncValueTable` a real data structure, capable of mapping BBs to ValueTables and able to free ValueTables as needed. This reduces peak memory usage in the compiler by 10% in the benchmarks flagged by the original review. As a consequence, we had to remove a handful of instances of the "declare-then-initialize" antipattern in unittests, as the FuncValueTable class is no longer default-constructible.	2023-12-23 13:44:45 -03:00
Florian Hahn	fbcf8a8cbb	[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262 ) The constraint system used for ConstraintElimination assumes all varibles to be signed. This can cause missed optimization in the unsigned system, due to missing the information that all variables are unsigned (non-negative). Variables can be marked as non-negative by adding Var >= 0 for all variables. This is done for arguments on ConstraintInfo construction and after adding new variables. This handles cases like the ones outlined in https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341 The original example shared above is now handled without this change, but adding another variable means that instcombine won't be able to simplify examples like https://godbolt.org/z/hTnra7zdY Adding the extra variables comes with a slight compile-time increase https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g +0.04% +0.07% +0.05% +0.02% stage2-O3 stage2-O0-g stage2-clang +0.05% +0.05% +0.05% https://github.com/llvm/llvm-project/pull/76262	2023-12-23 15:53:48 +01:00
Matt Arsenault	ed6dc62862	DAG: Handle equal size element build_vector promotion (#76213 )	2023-12-23 20:43:14 +07:00
Kazu Hirata	03dc806b12	[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)	2023-12-22 14:51:22 -08:00
Yingwei Zheng	345d7b1618	[InstCombine] Fold minmax intrinsic using KnownBits information (#76242 ) This patch tries to fold minmax intrinsic by using `computeConstantRangeIncludingKnownBits`. Fixes regression in [_karatsuba_rec:cpython/Modules/_decimal/libmpdec/mpdecimal.c](`c31943af16/Modules/_decimal/libmpdec/mpdecimal.c (L5460-L5462)`), which was introduced by #71396. See also https://github.com/dtcxzyw/llvm-opt-benchmark/issues/16#issuecomment-1865875756. Alive2 for splat vectors with undef: https://alive2.llvm.org/ce/z/J8hKWd	2023-12-23 04:41:32 +08:00
Momchil Velikov	4b6968952e	[AArch64] Implement spill/fill of predicate pair register classes (#76068 ) We are getting ICE with, e.g. ``` #include <arm_sve.h> void g(); svboolx2_t f0(int64_t i, int64_t n) { svboolx2_t r = svwhilelt_b16_x2(i, n); g(); return r; } ```	2023-12-22 15:54:12 +00:00
Nikita Popov	d82eccc752	[RegAllocFast] Avoid duplicate hash lookup (NFC)	2023-12-22 16:52:20 +01:00
Nikita Popov	658b260dbf	[Attributor] Don't construct pretty GEPs Bring this in line with other transforms like ArgPromotion/SROA/ SCEVExpander and always produce canonical i8 GEPs.	2023-12-22 16:48:13 +01:00
HaohaiWen	40ec791b15	[RegAllocFast] Refactor dominates algorithm for large basic block (#72250 ) The original brute force dominates algorithm is O(n) complexity so it is very slow for very large machine basic block which is very common with O0. This patch added InstrPosIndexes to assign index for each instruction and use it to determine dominance. The complexity is now O(1).	2023-12-22 23:06:16 +08:00
Lucas Duarte Prates	e4f1c52832	[AArch64] Assembly support for the Armv9.5-A Memory System Extensions (#76237 ) This implements assembly support for the Memory Systems Extensions introduced as part of the Armv9.5-A architecture version. The changes include: * New subtarget feature for FEAT_TLBIW. * New system registers for FEAT_HDBSS: * HDBSSBR_EL2 and HDBSSPROD_EL2. * New system registers for FEAT_HACDBS: * HACDBSBR_EL2 and HACDBSCONS_EL2. * New TLBI instructions for FEAT_TLBIW: * VMALLWS2E1(nXS), VMALLWS2E1IS(nXS) and VMALLWS2E1OS(nXS). * New system register for FEAT_FGWTE3: * FGWTE3_EL3.	2023-12-22 14:40:29 +00:00
Simon Pilgrim	3736e1d1cd	[SCEV] Ensure shift amount is in range before calling getZExtValue() Fixes #76234	2023-12-22 14:16:54 +00:00
Tomas Matheson	f5ab0bb148	[AArch64] paci<k>171615 auti<k>171615 assembly (#76227 ) This adds the following instructions which are added in PAuthLR: - PACIA171615 - PACIB171615 - AUTIA171615 - AUTIB171615 Also updates some encodings to match final published values. Documentation can be found here: https://developer.arm.com/documentation/ddi0602/2023-12/Base-Instructions Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-22 13:54:21 +00:00
Nikita Popov	c16559137c	[IndVars] Avoid unnecessary truncate for zext nneg use When performing sext IV widening, if one of the narrow uses is in a zext nneg, we can treat it like an sext and avoid the insertion of a trunc.	2023-12-22 11:30:17 +01:00
Nikita Popov	24e80d4cc5	[IndVars] Move "using namespace" to top-level scope (NFC)	2023-12-22 11:28:54 +01:00
Matt Arsenault	f7c3627338	DAG: Implement promotion for strict_fpextend (#74310 ) Test is a placeholder, will be merged into the existing test after additional bug fixes for illegal f16 targets are fixed.	2023-12-22 17:15:52 +07:00
Lucas Duarte Prates	7109a462cd	[AArch64] Assembly support for the Armv9.5-A RAS Extensions (#76161 ) This implements assembly support for the RAS extensions introduced as part of the Armv9.5-A architecture version. The changes include: * New system registers for Delegated SError exceptions for EL3 (FEAT_E3DSE): * VDISR_EL3 * VSESR_EL3 Mode details about these extensions can be found at: * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 * https://developer.arm.com/documentation/ddi0602/2023-09/ Co-authored-by: Jirui Wu <jirui.wu@arm.com> Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>	2023-12-22 10:06:06 +00:00
Matt Arsenault	0e46b49de4	Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86. PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667	2023-12-22 16:46:22 +07:00
Nikita Popov	54067c5fbe	[SROA] Use memcpy if type size does not match store size The original memcpy also copies the padding, so make sure that this is still the case after splitting. Fixes https://github.com/llvm/llvm-project/issues/64081.	2023-12-22 10:19:22 +01:00
Wang Pengcheng	17858ce6f3	[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209 ) Instead, we add a `BranchOnly` parameter to indicate that only branches with its predecessors will be fused. X86 is the only user of `createBranchMacroFusionDAGMutation`.	2023-12-22 16:31:38 +08:00
Shan Huang	06a9c6738a	[CVP] Fix #76058 : missing debug location in processSDiv function (#76118 ) This PR fixes #76058.	2023-12-22 09:26:32 +01:00
Shengchen Kan	ff32ab3ae7	[X86][NFC] Not imply TB in PS\|PD\|XS\|XD This can help us aovid introducing new classes T_MAPPS\|PD\|XS\|XD when a new opcode map is supported. And, T_MAPPS\|PD\|XS\|XD does not look better than T_MAP*, PS\|PD\|XS\|XD.	2023-12-22 15:44:30 +08:00
Aiden Grossman	a15532d764	[X86] Add CPU detection for more znver2 CPUs (#74955 ) This patch adds proper detection support for more znver2 CPUs. Specifically, this adds in support for CPUs codenamed Renoir, Lucienne, and Mendocino. This was originally proposedfor Renoir in https://reviews.llvm.org/D96220 and got approved, but slipped through the cracks. However, there is still a demand for this feature. In addition to adding support for more znver2 CPUs, this patch also includes some additional refactoring and comments related to cpu model information for zen CPUs. Fixes https://github.com/llvm/llvm-project/issues/74934.	2023-12-21 23:39:28 -08:00
XinWang10	1d4691a233	[X86][MC] Support Enc/Dec for EGPR for promoted CMPCCXADD instruction (#76125 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the encoding/decoding for promoted CMPCCXADD instruction in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2023-12-22 15:19:56 +08:00
Wang Pengcheng	f9c908862a	[RISCV] Split TuneShiftedZExtFusion (#76032 ) We split `TuneShiftedZExtFusion` into three fusions to make them reusable and match the GCC implementation[1]. The zexth/zextw fusions can be reused by XiangShan[2] and other commercial processors, but shifted zero extension is not so common. `macro-fusions-veyron-v1.mir` is renamed so it's not relevant to specific processor. References: [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html [2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode	2023-12-22 14:37:26 +08:00
wangpc	90f816e61f	[RISCV] Rename TuneVeyronFusions to TuneVentanaVeyron And fusion features are added to processor definition.	2023-12-22 14:29:31 +08:00
XinWang10	7b3323fffb	[X86][MC] Support Enc/Dec for EGPR for promoted CET instruction (#76023 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the encoding/decoding for promoted CET instruction in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2023-12-22 14:11:32 +08:00
Matt Arsenault	4d1cd38c95	DAG: Handle promotion of fcanonicalize This avoids a regression in a future commit	2023-12-22 12:50:18 +07:00
Vitaly Buka	0ccc1e7acd	Revert "[AArch64] Fold more load.x into load.i with large offset" Issue #76202 This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.	2023-12-21 21:12:40 -08:00
Matt Arsenault	248fba0cd8	AMDGPU: Remove pointless setOperationAction for xint_to_fp The legalize action for uint_to_fp/sint_to_fp uses the source integer type, not the result FP type so setting an action on an FP type does nothing.	2023-12-22 11:24:35 +07:00
Shengchen Kan	62d8ae0a1e	[X86][NFC] Remove class (VEX/EVEX/XOP)_4V and add class VVVV `VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does. For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields. To extract the common fields (e.g EVEX) into a class and set VVVV conditionally, we need VVVV to not imply other prefixes.	2023-12-22 10:38:15 +08:00
Craig Topper	e64f5d6305	[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682 ) ISD::VP_MERGE treats the false operand as the source for elements past VL. The vmerge instruction encodes 3 registers and treats the vd register as the source for the tail. This patch adds a new ISD opcode that models the tail source explicitly. During lowering we copy the false operand to this operand. I think we can merge RISCVISD::VSELECT_VL with this new opcode by using an UNDEF passthru, but I'll save that for another patch.	2023-12-21 14:34:49 -08:00
Derek Schuff	35a5df2de6	[WebAssembly][Object] Record section start offsets at start of payload (#76188 ) LLVM ObjectFile currently records the start offsets of sections as the start of the section header, whereas most other tools (WABT, emscripten, wasm-tools) record it as the start of the section content, after the header. This affects binutils tools such as objdump and nm, but not compilation/assembly (since that is driven by symbols and assembler labels which already have their values inside the section payload rather in the header. This patch updates LLVM to match the other tools.	2023-12-21 14:16:37 -08:00
Felipe de Azevedo Piovezan	058e527434	[AccelTable][NFC] Fix typos and duplicated code (#76155 ) Renaming a member variable from "Endoding" to "Encoding". Also replace inlined code for "isNormalized" with a call to the function, so that if the definition of normalization ever changes, we only need to change the one place.	2023-12-21 16:10:30 -03:00
Arthur Eubanks	7433b1ca3e	Reapply "[X86] Set SHF_X86_64_LARGE for globals with explicit well-known large section name (#74381 )" This reverts commit 19fff858931bf575b63a0078cc553f8f93cced20. Now that explicit large globals are handled properly in the small code model.	2023-12-21 10:51:30 -08:00
Arthur Eubanks	2366d53d8d	[X86] Fix more medium code model addressing modes (#75641 ) By looking at whether a global is large instead of looking at the code model. This also fixes references to large data in the small code model. We now always fold any 32-bit offset into the addressing mode with the large code model since it uses 64-bit relocations.	2023-12-21 10:40:56 -08:00
Tomas Matheson	7bd17212ef	Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947 ) This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae. Fix expensive checks failure by properly marking register def for ADR.	2023-12-21 18:32:55 +00:00
David Li	f44079db22	[ISel] Add pattern matching for depositing subreg value (#75978 ) Depositing value into the lowest byte/word is a common code pattern. This patch improves the code generation for it to avoid redundant AND and OR operations.	2023-12-21 10:18:57 -08:00
Tomas Matheson	192f720178	Re-land "[AArch64] Add FEAT_PAuthLR assembler support" (#75947 ) This reverts commit 199a0f9f5aaf72ff856f68e3bb708e783252af17. Fixed the left-shift of signed integer which was causing UB.	2023-12-21 18:09:31 +00:00
Mikhail Gudim	411cba215a	Revert "[InstCombine] Extend `foldICmpBinOp` to `add`-like `or`. (#71… (#76167 ) …396)" This reverts commit 8773c9be3d9868288f1f46957945d50ff58e4e91.	2023-12-21 11:41:09 -05:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
Tomas Matheson	199a0f9f5a	Revert "[AArch64] Add FEAT_PAuthLR assembler support" This reverts commit 934b1099cbf14fa3f86a269dff957da8e5fb619f. Buildbot failues on sanitizer-x86_64-linux-fast	2023-12-21 16:26:39 +00:00
Tomas Matheson	9f0f558742	Revert "[AArch64] Codegen support for FEAT_PAuthLR" This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5. Builtbot failures with expensive checks enabled.	2023-12-21 16:25:55 +00:00
Kazu Hirata	e01c063684	[llvm] Use DenseMap::contains (NFC)	2023-12-21 08:18:47 -08:00
Nikita Popov	a134abf4be	[ValueTracking] Make isGuaranteedNotToBeUndef() more precise (#76160 ) Currently isGuaranteedNotToBeUndef() is the same as isGuaranteedNotToBeUndefOrPoison(). This function is used in places where we only care about undef (due to multi-use issues), not poison. Make it more precise by only considering instructions that can create undef (like loads or call), and ignore those that can only create poison. In particular, we can ignore poison-generating flags. This means that inferring more flags has less chance to pessimize other transforms.	2023-12-21 16:49:37 +01:00
Nikita Popov	b8df88b41c	[InstCombine] Support zext nneg in gep of sext add fold Add m_NNegZext() and m_SExtLike() matchers to make doing these kinds of changes simpler in the future.	2023-12-21 16:38:09 +01:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Shengchen Kan	8eccf2b872	[X86] Set Uses = [EFLAGS] for ADCX/ADOX According to Intel SDE, ADCX reads CF and ADOX reads OF. `Uses` was set to empty by accident, the bug was not exposed b/c compiler never emits these instructions.	2023-12-21 23:01:00 +08:00

1 2 3 4 5 ...

176973 Commits