llvm-project

Author	SHA1	Message	Date
Kazu Hirata	1daf2994de	[llvm] Use StringRef::contains (NFC)	2023-12-23 22:21:52 -08:00
Shengchen Kan	17ff25a58e	[X86][NFC] Not infer OpSize from Xi8\|16\|32\|64 For legacy (arithmetic) instructions, the operand size override prefix (0x66) is used to switch the operand data size from 32b to 16b (in 32/64-bit mode), 16b to 32b (in 16-bit mode). That's why we set OpSize16 for 16-bit instructions and set OpSize32 for 32-bit instructions. But it's not a generic rule any more after APX. APX adds 4 variants for arithmetic instructions: promoted EVEX, NDD (new data destination), NF (no flag), NF_NDD. All the 4 variants are in EVEX space and only legal in 64-bit mode. EVEX.pp is set to 01 for the 16-bit instructions to encode 0x66. For APX, we should set OpSizeFixed for 8/16/32/64-bit variants and set PD for the 16-bit variants. Hence, to reuse the classes ITy and its subclasses BinOp* for APX instructions, we extract the OpSize setting from the class ITy.	2023-12-24 12:00:25 +08:00
Shengchen Kan	6e20df1a3b	[X86][NFC] Set default OpPrefix to PS for XOP/VEX/EVEX instructions It helps simplify the class definitions. Now, the only explicit usage of PS is to check prefix 0x66/0xf2/0xf3 can not be used a prefix, e.g. wbinvd. See 82974e0114f02ffc07557e217d87f8dc4e100a26 for more details.	2023-12-24 10:20:40 +08:00
Momchil Velikov	4b6968952e	[AArch64] Implement spill/fill of predicate pair register classes (#76068 ) We are getting ICE with, e.g. ``` #include <arm_sve.h> void g(); svboolx2_t f0(int64_t i, int64_t n) { svboolx2_t r = svwhilelt_b16_x2(i, n); g(); return r; } ```	2023-12-22 15:54:12 +00:00
Lucas Duarte Prates	e4f1c52832	[AArch64] Assembly support for the Armv9.5-A Memory System Extensions (#76237 ) This implements assembly support for the Memory Systems Extensions introduced as part of the Armv9.5-A architecture version. The changes include: * New subtarget feature for FEAT_TLBIW. * New system registers for FEAT_HDBSS: * HDBSSBR_EL2 and HDBSSPROD_EL2. * New system registers for FEAT_HACDBS: * HACDBSBR_EL2 and HACDBSCONS_EL2. * New TLBI instructions for FEAT_TLBIW: * VMALLWS2E1(nXS), VMALLWS2E1IS(nXS) and VMALLWS2E1OS(nXS). * New system register for FEAT_FGWTE3: * FGWTE3_EL3.	2023-12-22 14:40:29 +00:00
Tomas Matheson	f5ab0bb148	[AArch64] paci<k>171615 auti<k>171615 assembly (#76227 ) This adds the following instructions which are added in PAuthLR: - PACIA171615 - PACIB171615 - AUTIA171615 - AUTIB171615 Also updates some encodings to match final published values. Documentation can be found here: https://developer.arm.com/documentation/ddi0602/2023-12/Base-Instructions Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-22 13:54:21 +00:00
Lucas Duarte Prates	7109a462cd	[AArch64] Assembly support for the Armv9.5-A RAS Extensions (#76161 ) This implements assembly support for the RAS extensions introduced as part of the Armv9.5-A architecture version. The changes include: * New system registers for Delegated SError exceptions for EL3 (FEAT_E3DSE): * VDISR_EL3 * VSESR_EL3 Mode details about these extensions can be found at: * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 * https://developer.arm.com/documentation/ddi0602/2023-09/ Co-authored-by: Jirui Wu <jirui.wu@arm.com> Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>	2023-12-22 10:06:06 +00:00
Wang Pengcheng	17858ce6f3	[MacroFusion] Remove createBranchMacroFusionDAGMutation (#76209 ) Instead, we add a `BranchOnly` parameter to indicate that only branches with its predecessors will be fused. X86 is the only user of `createBranchMacroFusionDAGMutation`.	2023-12-22 16:31:38 +08:00
Shengchen Kan	ff32ab3ae7	[X86][NFC] Not imply TB in PS\|PD\|XS\|XD This can help us aovid introducing new classes T_MAPPS\|PD\|XS\|XD when a new opcode map is supported. And, T_MAPPS\|PD\|XS\|XD does not look better than T_MAP*, PS\|PD\|XS\|XD.	2023-12-22 15:44:30 +08:00
XinWang10	1d4691a233	[X86][MC] Support Enc/Dec for EGPR for promoted CMPCCXADD instruction (#76125 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the encoding/decoding for promoted CMPCCXADD instruction in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2023-12-22 15:19:56 +08:00
Wang Pengcheng	f9c908862a	[RISCV] Split TuneShiftedZExtFusion (#76032 ) We split `TuneShiftedZExtFusion` into three fusions to make them reusable and match the GCC implementation[1]. The zexth/zextw fusions can be reused by XiangShan[2] and other commercial processors, but shifted zero extension is not so common. `macro-fusions-veyron-v1.mir` is renamed so it's not relevant to specific processor. References: [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637303.html [2] https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/decode	2023-12-22 14:37:26 +08:00
wangpc	90f816e61f	[RISCV] Rename TuneVeyronFusions to TuneVentanaVeyron And fusion features are added to processor definition.	2023-12-22 14:29:31 +08:00
XinWang10	7b3323fffb	[X86][MC] Support Enc/Dec for EGPR for promoted CET instruction (#76023 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the encoding/decoding for promoted CET instruction in EVEX space. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2023-12-22 14:11:32 +08:00
Vitaly Buka	0ccc1e7acd	Revert "[AArch64] Fold more load.x into load.i with large offset" Issue #76202 This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.	2023-12-21 21:12:40 -08:00
Matt Arsenault	248fba0cd8	AMDGPU: Remove pointless setOperationAction for xint_to_fp The legalize action for uint_to_fp/sint_to_fp uses the source integer type, not the result FP type so setting an action on an FP type does nothing.	2023-12-22 11:24:35 +07:00
Shengchen Kan	62d8ae0a1e	[X86][NFC] Remove class (VEX/EVEX/XOP)_4V and add class VVVV `VEX_4V` does not look simpler than `VEX, VVVV`. It's kind of confusing b/c classes like `VEX_L`, `VEX_LIG` do not imply `VEX` but it does. For APX, we have promote EVEX, NDD, NF and NDD_NF instructions. All of the 4 variants are in EVEX space and NDD/NDD_NF set the VVVV fields. To extract the common fields (e.g EVEX) into a class and set VVVV conditionally, we need VVVV to not imply other prefixes.	2023-12-22 10:38:15 +08:00
Craig Topper	e64f5d6305	[RISCV] Replace RISCVISD::VP_MERGE_VL with a new node that has a separate passthru operand. (#75682 ) ISD::VP_MERGE treats the false operand as the source for elements past VL. The vmerge instruction encodes 3 registers and treats the vd register as the source for the tail. This patch adds a new ISD opcode that models the tail source explicitly. During lowering we copy the false operand to this operand. I think we can merge RISCVISD::VSELECT_VL with this new opcode by using an UNDEF passthru, but I'll save that for another patch.	2023-12-21 14:34:49 -08:00
Arthur Eubanks	7433b1ca3e	Reapply "[X86] Set SHF_X86_64_LARGE for globals with explicit well-known large section name (#74381 )" This reverts commit 19fff858931bf575b63a0078cc553f8f93cced20. Now that explicit large globals are handled properly in the small code model.	2023-12-21 10:51:30 -08:00
Arthur Eubanks	2366d53d8d	[X86] Fix more medium code model addressing modes (#75641 ) By looking at whether a global is large instead of looking at the code model. This also fixes references to large data in the small code model. We now always fold any 32-bit offset into the addressing mode with the large code model since it uses 64-bit relocations.	2023-12-21 10:40:56 -08:00
Tomas Matheson	7bd17212ef	Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947 ) This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae. Fix expensive checks failure by properly marking register def for ADR.	2023-12-21 18:32:55 +00:00
David Li	f44079db22	[ISel] Add pattern matching for depositing subreg value (#75978 ) Depositing value into the lowest byte/word is a common code pattern. This patch improves the code generation for it to avoid redundant AND and OR operations.	2023-12-21 10:18:57 -08:00
Tomas Matheson	192f720178	Re-land "[AArch64] Add FEAT_PAuthLR assembler support" (#75947 ) This reverts commit 199a0f9f5aaf72ff856f68e3bb708e783252af17. Fixed the left-shift of signed integer which was causing UB.	2023-12-21 18:09:31 +00:00
Craig Topper	0dcff0db3a	[RISCV] Add codegen support for experimental.vp.splice (#74688 ) IR intrinsics were already defined, but no codegen support had been added. I extracted this code from our downstream. Some of it may have come from https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.	2023-12-21 08:38:32 -08:00
Tomas Matheson	199a0f9f5a	Revert "[AArch64] Add FEAT_PAuthLR assembler support" This reverts commit 934b1099cbf14fa3f86a269dff957da8e5fb619f. Buildbot failues on sanitizer-x86_64-linux-fast	2023-12-21 16:26:39 +00:00
Tomas Matheson	9f0f558742	Revert "[AArch64] Codegen support for FEAT_PAuthLR" This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5. Builtbot failures with expensive checks enabled.	2023-12-21 16:25:55 +00:00
Kazu Hirata	e01c063684	[llvm] Use DenseMap::contains (NFC)	2023-12-21 08:18:47 -08:00
Jay Foad	8fdfd34cd2	[AMDGPU] Remove GDS and GWS for GFX12 (#76148 )	2023-12-21 15:27:08 +00:00
Shengchen Kan	8eccf2b872	[X86] Set Uses = [EFLAGS] for ADCX/ADOX According to Intel SDE, ADCX reads CF and ADOX reads OF. `Uses` was set to empty by accident, the bug was not exposed b/c compiler never emits these instructions.	2023-12-21 23:01:00 +08:00
Shengchen Kan	2fe94cead0	[X86][NFC] Refine code in X86InstrArithmetic.td 1. Simplify the variable name 2. Change HasOddOpcode to HasEvenOpcode b/c a. opcode of any 8-bit arithmetic instruction is even b. opcode of a 16/32/64-bit arithmetic instruction is usually odd, but it can be even sometimes, e.g. INC/DEC, ADCX/ADOX c. so that we can remove `let Opcode = o` for the mentioned corner cases.	2023-12-21 22:24:59 +08:00
Tomas Matheson	5992ce90b8	[AArch64] Codegen support for FEAT_PAuthLR - Adds a new +pc option to -mbranch-protection that will enable the use of PC as a diversifier in PAC branch protection code. - When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions (pacibsppc, retaasppc, etc) are used. Documentation for the relevant instructions can be found here: https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/ Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-21 14:18:33 +00:00
Oliver Stannard	934b1099cb	[AArch64] Add FEAT_PAuthLR assembler support Add assembly/disassembly support for the new PAuthLR instructions introduced in Armv9.5-A: - AUTIASPPC/AUTIBSPPC - PACIASPPC/PACIBSPPC - PACNBIASPPC/PACNBIBSPPC - RETAASPPC/RETABSPPC - PACM Documentation for these instructions can be found here: https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/	2023-12-21 14:18:33 +00:00
Shengchen Kan	b223aebd3f	[X86][NFC] Refine code in X86InstrArithmetic.td 1. Remove redandunt classes 2. Correct comments 3. Move duplicated `let` statement into class definition 4. Simplify the variable name and align the code	2023-12-21 20:50:09 +08:00
zhongyunde 00443407	f568763641	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2023-12-21 18:54:15 +08:00
zhongyunde 00443407	32878c2065	[AArch64] merge index address with large offset into base address A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, #56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix https://github.com/llvm/llvm-project/issues/71917	2023-12-21 18:54:14 +08:00
David Green	c0931d4950	[AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT for each lane, allowing all the existing tablegen patterns to apply to them. A couple of tablegen patterns need to be altered to make sure the type of the constant operand is known, so that the patterns are recognized under global isel. Closes #75662	2023-12-21 09:22:23 +00:00
Yeting Kuo	9b561ca044	[RISCV] Make performFP_TO_INTCombine fold with ISD::FRINT. (#76020 ) Fold (fp_to_int (frint X)) to (fcvt X) without rounding mode.	2023-12-21 15:03:36 +08:00
Brandon Wu	b3769adbc5	[RISCV] Fix wrong lmul for sf_vfnrclip (#76016 )	2023-12-21 13:24:26 +08:00
Shengchen Kan	b26c0ed93a	[X86][NFC] Remove class BinOpRM_ImplicitUse b/c it's used once only	2023-12-21 11:31:39 +08:00
Shengchen Kan	5fa46daab3	[X86] Replace EVEX_NoCD8 with EVEX, NoCD8 This fixes the build error after 61b58123a3137323d6876006a6171d42e5e03cc1	2023-12-21 11:05:56 +08:00
Shengchen Kan	61b58123a3	[X86][NFC] Not imply EVEX in NoCD8 NDD (new data destination) instructions need to set NoCD8 and EVEX_4V. EVEX_4V already implies EVEX. If NoCD8 implied EVEX too, we would not be able to reuse the class.	2023-12-21 10:46:25 +08:00
Craig Topper	b03f0c596a	[RISCV] Add sifive-p450 CPU. (#75760 ) This is an out of order core with no vector unit. More information: https://www.sifive.com/cores/performance-p450-470 Scheduler model and other tuning will come in separate patches.	2023-12-20 09:52:02 -08:00
Florian Hahn	b1a5ee1feb	[ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527 ) emitPopInst checks a single function exit MBB. If other paths also exit the function and any of there terminators uses LR implicitly, it is not save to clear the Restored bit. Check all terminators for the function before clearing Restored. This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll where the machine-outliner previously introduced BLs that clobbered LR which in turn is used by the tail call return. Alternative to #73553	2023-12-20 16:56:15 +01:00
Lucas Duarte Prates	d43fc5a6ad	Reland: [AArch64] Assembly support for the Checked Pointer Arithmetic Extension (#73777 ) This introduces assembly support for the Checked Pointer Arithmetic Extension (FEAT_CPA), annouced as part of the Armv9.5-A architecture version. The changes include: * New subtarget feature for FEAT_CPA * New scalar instruction for pointer arithmetic * ADDPT, SUBPT, MADDPT, and MSUBPT * New SVE instructions for pointer arithmetic * ADDPT (vectors, predicated), ADDPT (vectors, unpredicated) * SUBPT (vectors, predicated), SUBPT (vectors, unpredicated) * MADPT and MLAPT * New ID_AA64ISAR3_EL1 system register Mode details about the extension can be found at: * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 * https://developer.arm.com/documentation/ddi0602/2023-09/ Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>	2023-12-20 15:43:17 +00:00
Simon Pilgrim	6ec350b483	[X86] SimplifyDemandedVectorEltsForTargetShuffle - don't simplify constant mask if it has multiple uses Avoid generating extra constant vectors	2023-12-20 15:22:48 +00:00
Hassnaa Hamdi	f3dcc0cba9	[LLVM][AArch64][tblgen]: Match clamp pattern (#75529 ) Add isel pattern to replase min(max(v1,v2),v3) by clamp Add tests for uclamp, sclamp, bfclamp, fclamp.	2023-12-20 14:36:58 +00:00
Matt Arsenault	9e574a3936	DAG: Fix expansion of bf16 sourced extloads Also fix assorted vector extload failures for AMDGPU.	2023-12-20 19:24:27 +07:00
Simon Pilgrim	3974d89bde	[X86] getTargetConstantPoolFromBasePtr - drop const qualifier Return ConstantPoolSDNode instead of const ConstantPoolSDNode - doesn't affect the accessors at all and makes it easier to use result in calls expecting a SDNode.	2023-12-20 10:40:13 +00:00
Momchil Velikov	52820bdd68	[AArch64] Update target feature requirements of SVE bfloat instructions (#75596 ) According to the latest update of the ISA https://developer.arm.com/documentation/ddi0602/2023-09/?lang=en all of the affected instruction encodings now require (FEAT_SVE2 or FEAT_SME2) and FEAT_SVE_B16B16	2023-12-20 10:16:40 +00:00
Nikita Popov	9d60e95bcd	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914 ) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped.	2023-12-20 11:01:59 +01:00
Yeting Kuo	b7376c3196	[RISCV][NFC] Add comments and tests for frint case of performFP_TO_INT_SATCombine. (#76014 ) performFP_TO_INT_SATCombine could also serve pattern (fp_to_int_sat (frint X)).	2023-12-20 14:56:28 +08:00

1 2 3 4 5 ...

75658 Commits