llvm-project

Author	SHA1	Message	Date
Daniil Kovalev	da083e358e	[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811 ) This re-applies #96164 after revert in #102434. Support the following relocations and assembly operators: - `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`) - `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`) `LOADgotAUTH` pseudo-instruction is introduced which is later expanded to actual instruction sequence like the following. ``` adrp x16, :got_auth:sym add x16, x16, :got_auth_lo12:sym ldr x0, [x16] autia x0, x16 ``` If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT load is lowered similarly to `LOADgotAUTH`. ``` @var = global i32 0 define ptr @resign_globalvar() { ret ptr ptrauth (ptr @var, i32 3, i64 43) } ``` If FPAC bit is not set and auth instruction is emitted, a check+trap sequence similar to one used for `AUT` pseudo is emitted to ensure auth success. Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. See also specification https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got	2024-11-01 12:21:10 +03:00
Thorsten Schütt	8e3772744d	[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470 ) There are patterns for: * {nxv2s32, s32, s64}, * {nxv4s16, s16, s64}, * {nxv2s16, s16, s64}	2024-11-01 06:10:26 +01:00
Thorsten Schütt	aa70d846b0	[GlobalISel][AArch64] Legalize G_SPLAT_VECTOR (#114006 ) {nxv8s16, s16} fails to select. {nxv16s8, s8} no patterns available.	2024-10-31 22:20:08 +01:00
Sander de Smalen	41448c1d07	[AArch64] NFC: Add RUN line for +sve2 for sve-intrinsics-perm-select.ll The codegen for SVE and SVE2 may be different (e.g. for splice and ext). A follow-up patch will improve codegen for EXT.	2024-10-31 14:46:00 +00:00
Benjamin Maxwell	89a8c71db6	[SDAG] Support expanding `FSINCOS` to vector library calls (#114039 ) This shares most of its code with the scalar sincos expansion. It allows expanding vector FSINCOS nodes to a library call from the specified `-vector-library`. The upside of this is it will mean the vectorizer only needs to handle the sincos intrinsic, which has no memory effects, and this can handle lowering the intrinsic to a call that takes output pointers.	2024-10-31 12:41:43 +00:00
SpencerAbson	0800351da4	[AArch64][SVE] Use INS when moving elements from bottom 128b of SVE type (#114034 ) Moving elements from a scalable vector to a fixed-lengh vector should use[ INS (vector, element) ](https://developer.arm.com/documentation/100069/0606/SIMD-Vector-Instructions/INS--vector--element-) when we know that the extracted element is in the bottom 128-bits of the scalable vector. This avoids inserting unecessary UMOV/FMOV instructions.	2024-10-31 10:36:00 +00:00
dnsampaio	28d0718033	[DAGCombiner] Add combine avg from shifts (#113909 ) This teaches dagcombiner to fold: `(asr (add nsw x, y), 1) -> (avgfloors x, y)` `(lsr (add nuw x, y), 1) -> (avgflooru x, y)` as well the combine them to a ceil variant: `(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` `(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)` iff valid for the target. Removes some of the ARM MVE patterns that are now dead code. It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the immediate splatting as before.	2024-10-31 10:57:27 +01:00
Thorsten Schütt	6effab990c	Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353 ) Reverts llvm/llvm-project#114310	2024-10-31 05:41:16 +01:00
Thorsten Schütt	6bf214b7c6	[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310 ) There are patterns for: * {nxv2s32, s32, s64}, * {nxv4s16, s16, s64}, * {nxv2s16, s16, s64}	2024-10-31 04:56:41 +01:00
Thorsten Schütt	b3bb6f18bb	[GlobalISel] Import samesign flag (#114267 ) Credits: https://github.com/llvm/llvm-project/pull/111419 Fixes icmp-flags.mir First attempt: https://github.com/llvm/llvm-project/pull/113090 Revert: https://github.com/llvm/llvm-project/pull/114256	2024-10-30 19:56:25 +01:00
Thorsten Schütt	4b028773b2	Revert "[GlobalISel] Import samesign flag" (#114256 ) Reverts llvm/llvm-project#113090	2024-10-30 17:03:17 +01:00
Thorsten Schütt	72b115301d	[GlobalISel] Import samesign flag (#113090 ) Credits: https://github.com/llvm/llvm-project/pull/111419	2024-10-30 16:34:01 +01:00
Sander de Smalen	602f43686c	[AArch64] Add patterns for constructive splice. (#113912 ) SVE2 adds the constructive splice instruction, which takes a tuple. Even though the register allocator must ensure that the tuple uses consecutive registers for the tuple, it's likely to be more efficient than using the destructive splice instruction when the first operand is reused.	2024-10-30 13:17:31 +00:00
Akshat Oke	44d0e9522a	[CodeGen][NewPM] Port TailDuplicate pass to NPM (#113293 )	2024-10-30 11:48:40 +05:30
David Green	83ae171722	[AArch64] Add ComputeNumSignBits for VASHR. (#113957 ) As with a normal ISD::SRA node, they take the number of sign bits of the incoming value and increase it by the shifted amount.	2024-10-29 21:02:32 +00:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
David Green	5a5b78a84e	[AArch64][GlobalISel] Lower aarch64.neon.smull/umull intrinsics. As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64 instructions.	2024-10-28 18:51:10 +00:00
Simon Pilgrim	670512b5c3	[AArch64] Regenerate srem-lkk.ll to add missing asm comments Reduces diff in #112588	2024-10-28 16:00:45 +00:00
Jack Styles	86f76c3b17	[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171 ) As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced, `DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that the PC has been used with the signing instruction. This change includes three commits - Libunwind support for the newly introduced DWARF Instruction - CodeGen Support for the DWARF Instructions - Reversing the changes made in #96377. Due to `DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed immediately after the signing instruction, this would mean the CFI Instruction location was not consistent with the generated location when not using FEAT_PAuthLR. The commit reverses the changes and makes the location consistent across the different branch protection options. While this does have a code size effect, this is a negligible one. For the ABI information, see here: `853286c7ab/aadwarf64/aadwarf64.rst (id23)`	2024-10-28 08:22:38 +00:00
Serge Pavlov	819abe412d	[Test] Fix usage of constrained intrinsics (#113523 ) Some tests contain errors in constrained intrinsic usage, such as missed or extra type parameters, wrong type parameters order and some other. --------- Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>	2024-10-28 14:07:32 +07:00
Thorsten Schütt	7b3da7b3b2	[GlobalISel][AArch64] Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR for SVE (#110561 ) Credits: https://github.com/llvm/llvm-project/pull/72976 LLVM ERROR: cannot select: %3:zpr(<vscale x 2 x s64>) = G_MUL %0:fpr, %1:fpr (in function: xmulnxv2i64) ;; mul define void @xmulnxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, ptr %p) { entry: %c = mul <vscale x 2 x i64> %a, %b store <vscale x 2 x i64> %c, ptr %p, align 16 ret void } define void @mulnxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, ptr %p) { entry: %c = mul <vscale x 4 x i32> %a, %b store <vscale x 4 x i32> %c, ptr %p, align 16 ret void } define void @mulnxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, ptr %p) { entry: %c = mul <vscale x 8 x i16> %a, %b store <vscale x 8 x i16> %c, ptr %p, align 16 ret void } define void @mulnxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, ptr %p) { entry: %c = mul <vscale x 16 x i8> %a, %b store <vscale x 16 x i8> %c, ptr %p, align 16 ret void }	2024-10-27 23:14:07 +01:00
Gaëtan Bossu	a0c318938a	[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573 ) Both are based on MachineLICMBase, and the functionality there is "switched" based on a PreRegAlloc flag. This commit is simply about trusting the original value of that flag, defined by the `MachineLICM` and `EarlyMachineLICM` classes. The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(), which is un-reliable due to how it is inferred by the MIRParser. I see that we can now define isSSA in MIR (thanks @gargaroff ), meaning the fix isn’t really needed anymore, but redefining that flag still feels wrong. Note that I'm looking into upstreaming more changes to MachineLICM, see [the discourse thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).	2024-10-25 11:19:22 -07:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Sander de Smalen	db0e376044	[AArch64] Fix failure with inline asm and svcount (#112537 ) This fixes an issue where the compiler runs into an assertion failure for the following example: register svcount_t pred asm("pn8") = svptrue_c8(); asm("ld1w { z0.s, z4.s, z8.s, z12.s }, %[pred]/z, [x0]\n" : : [pred] "Uph" (pred) : "memory", "cc"); Here the register constraint that ends up in the LLVM IR is "{pn8}", but the code in `TargetRegisterInfo::getRegForInlineAsmConstraint` that parses that string, follows a path where it queries a suitable register class for this register (<=> PPRorPNR regclass), for which it then chooses `nxv16i1` as a suitable type. These choices individually are correct, but the combined result isn't, because the type should be `aarch64svcount`. This then results in issues later on in SelectionDAGBuilder.cpp in CopyToReg because the type of the actual value and the computed type from the constraint don't match. This PR pre-empts this issue by parsing the predicate explicitly and returning the correct register class.	2024-10-24 17:41:07 +01:00
Benjamin Maxwell	cd0373e029	[AArch64] Allow single-element vector FP converts with +sme2p2 (#112905 ) Follow up to #112213 now that the +sme2p2 feature flag has landed. The single-element vector variants of FCVTZS, FCVTZU, UCVTF, and SCVTF are allowed in streaming SVE mode with +sme2p2. Reference: - https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/FCVTZS--vector--integer---Floating-point-convert-to-signed-integer--rounding-toward-zero--vector-- - https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/UCVTF--vector--integer---Unsigned-integer-convert-to-floating-point--vector-- - https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/SCVTF--vector--integer---Signed-integer-convert-to-floating-point--vector--	2024-10-24 11:21:17 +01:00
SpencerAbson	629d9809ab	[LLVM][AArch64] Add assembly/disassembly for FTMOPA and BFTMOPA (#113230 ) This patch adds assembly/disassembly for the following SME2p2 instructions (part of the 2024 AArch64 ISA update) - BFTMOPA (widening) - FEAT_SME2p2 - BFTMOPA (non-widening) - FEAT_SME2p2 & FEAT_SME_B16B16 - FTMOPA (4-way) - FEAT_SME2p2 & FEAT_SME_F8F32 - FTMOPA (2-way, 8-to-16) - FEAT_SME2p2 & FEAT_SME_F8F16 - FTMOPA (2-way, 16-to-32) - FEAT_SME2p2 - FTMOPA (non-widening, f16) - FEAT_SME2p2 & FEAT_SME_F16F16 - FTMOPA (non-widening, f32) - FEAT_SME2p2 - Add new ZPR_K register class and ZK register operand - Introduce assembler extension tests for the new sme2p2 feature In accordance with: https://developer.arm.com/documentation/ddi0602/latest/ Co-authored-by: Marian Lukac marian.lukac@arm.com	2024-10-23 15:40:57 +01:00
Ricardo Jesus	8a9921f569	[AArch64] Use INDEX for constant Neon step vectors (#113424 ) When compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors, e.g.: ``` uint32x4_t foo() { return (uint32x4_t){0, 1, 2, 3}; } ``` Currently: ``` foo(): adrp x8, .LCPI1_0 ldr q0, [x8, :lo12:.LCPI1_0] ret ``` With INDEX: ``` foo(): index z0.s, #0, #1 ret ``` The logic for this was already in `LowerBUILD_VECTOR`, though it was hidden under a check for `!Subtarget->isNeonAvailable()`. This patch refactors this to enable the corresponding code path unconditionally for constant step vectors (as long as we can use SVE for them).	2024-10-23 15:20:33 +01:00
Vladimir Radosavljevic	401d123a1f	[MCP] Optimize copies when src is used during backward propagation (#111130 ) Before this patch, redundant COPY couldn't be removed for the following case: ``` $R0 = OP ... ... // Read of %R0 $R1 = COPY killed $R0 ``` This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to: ``` $R1 = OP ... ... // Replace all uses of %R0 with $R1 ```	2024-10-23 13:37:02 +02:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00
Paul Walker	f1ade1f874	[LLVM][CodeGen][AArch64] while_le(#,max_int) -> all_active (#111183 ) When the second operand of an incrementing while instruction is the maximum value, comparisons that include equality can never fail.	2024-10-22 12:28:39 +01:00
James Chesterman	11c818816d	[AArch64] Improve index selection for histograms (#111150 ) Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.	2024-10-22 11:14:00 +01:00
Florian Mayer	23b18fa01e	[MTE] Do not allow local aliases to MTE globals (#106280 ) With this change and appropriate linker changes (https://r.android.com/3236256) AOSP boots with memtag-global throughout the platform. Without this change, we would sometimes generate PC-relative references to tagged globals, which then do not have the proper tag.	2024-10-21 17:00:41 -07:00
David Green	009fb567ce	[AArch64] Add patterns for combining qxtn+rshr to qrshrn Similar to bd861d0e690cfd05184d86, this adds some patterns for converting signed and unsigned variants of rshr+qxtn to qrshrn.	2024-10-21 21:06:48 +01:00
Ellis Hoag	e6ada7162e	[regalloc][basic] Change spill weight for optsize funcs (#112960 ) Change the spill weight calculations for `optsize` functions to remove the block frequency multiplier. For those functions, we do not want to consider the runtime cost of spilling, only the codesize cost. I built a large app with the basic and greedy (default) register allocator enabled. \| Regalloc Type \| Uncompressed Size Delta \| Compressed Size Delta \| \| - \| - \| - \| \| Basic \| -303.8 KiB (-0.23%) \| -232.0 KiB (-0.39%) \| \| Greedy \| 159.1 KiB (0.12%) \| 130.1 KiB (0.22%) \| Since I only saw a size win with the basic register allocator, I decided to only change the behavior for that type.	2024-10-21 11:10:50 -07:00
Spencer Abson	42ba452aa9	[NFC] Fix -WError for unused Encode/Decode ZK methods Remove the unused functions and register classes from the change below `4679583181`	2024-10-21 16:11:58 +00:00
SpencerAbson	4679583181	[LLVM][AArch64] Add register classes for Armv9.6 assembly (#111717 ) Add new register classes/operands and their encoder/decoder behaviour required for the new Armv9.6 instructions (see https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension). This work is the basis ofthe 2024 Armv9.6 architecture update effort for SME. Co-authored-by: Caroline Concatto caroline.concatto@arm.com Co-authored-by: Marian Lukac marian.lukac@arm.com Co-authored-by: Momchil Velikov momchil.velikov@arm.com	2024-10-21 15:49:24 +01:00
David Green	bd861d0e69	[AArch64] Add some basic patterns for qshrn. With the truncssat nodes these are relatively simple tablegen patterns to add. The existing intrinsics are converted to shift+truncsat to they can lower using the new patterns. Fixes #112925.	2024-10-21 15:04:20 +01:00
Nikita Popov	e2074c60bb	[AArch64] Use implicitTrunc in isBitfieldDstMask() (NFC) This code intentionally discards the high bits, so set implicitTrunc=true. This is currently NFC but will enable an APInt assertion in the future.	2024-10-21 15:53:21 +02:00
Thorsten Schütt	10f6d01e3d	[GlobalISel][AArch64] Legalize G_EXTRACT_SUBVECTOR (#112946 ) for future combines	2024-10-19 22:42:49 +02:00
Thorsten Schütt	d8b17f2fb6	[GlobalISel] Combine G_UNMERGE_VALUES with anyext and build vector (#112370 ) G_UNMERGE_VALUES (G_ANYEXT (G_BUILD_VECTOR)) ag G_UNMERGE_VALUES llvm/test/CodeGen/AArch64/GlobalISel \| grep ANYEXT [ANYEXT] is build vector or shuffle vector Prior art: https://reviews.llvm.org/D87117 https://reviews.llvm.org/D87166 https://reviews.llvm.org/D87174 https://reviews.llvm.org/D87427 ; CHECK-NEXT: [[BUILD_VECTOR2:%[0-9]+]]:_(<8 x s8>) = G_BUILD_VECTOR [[C2]](s8), [[C2]](s8), [[C2]](s8), [[C2]](s8), [[DEF1]](s8), [[DEF1]](s8), [[DEF1]](s8), [[DEF1]](s8) ; CHECK-NEXT: [[ANYEXT1:%[0-9]+]]:_(<8 x s16>) = G_ANYEXT [[BUILD_VECTOR2]](<8 x s8>) ; CHECK-NEXT: [[UV10:%[0-9]+]]:_(<4 x s16>), [[UV11:%[0-9]+]]:_(<4 x s16>) = G_UNMERGE_VALUES [[ANYEXT1]](<8 x s16>) Test: llvm/test/CodeGen/AArch64/GlobalISel/combine-unmerge.mir	2024-10-19 09:41:43 +02:00
David Green	7e87c2ae5d	[AArch64] Add some qshrn test cases. NFC	2024-10-18 19:05:57 +01:00
Benjamin Maxwell	5f7502bf1f	[AArch64][SVE] Support lowering fixed-length BUILD_VECTORS to ZIPs (#111698 ) This allows lowering fixed-length (non-constant) BUILD_VECTORS (<= 128-bit) to a chain of ZIP1 instructions when Neon is not available, rather than using the default lowering, which is to spill to the stack and reload. For example, ``` t5: v4f32 = BUILD_VECTOR(t0, t1, t2, t3) ``` Becomes: ``` zip1 z0.s, z0.s, z1.s // z0 = t0,t1,... zip1 z2.s, z2.s, z3.s // z2 = t2,t3,... zip1 z0.d, z0.d, z2.d // z0 = t0,t1,t2,t3,... ``` When values are already in FRPs, this generally seems to lead to a more compact output with less movement to/from the stack.	2024-10-18 10:19:22 +01:00
David Green	2f792f6e71	[AArch64][GlobalISel] Add some post-legalization cast combines. (#112509 ) This helps clear up some of the legalization artefacts. Not all of the cast_combines are added (notably select combines) as they currently have questionable benefit in the test updates.	2024-10-18 09:57:25 +01:00
Daniil Kovalev	6bb63002fc	[PAC] Fix address discrimination for type info vtable pointers (#102199 ) In #99726, `-fptrauth-type-info-vtable-pointer-discrimination` was introduced, which is intended to enable type and address discrimination for type_info vtable pointers. However, some codegen logic for actually enabling address discrimination was missing. This patch addresses the issue. Fixes #101716	2024-10-18 08:58:26 +03:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
David Green	569ad7cf34	[AArch64][GlobalISel] Move UseOutlineAtomics to a bool check. NFC Similar to #111287, this moves the UseOutlineAtomics legalization rules to a boolean predicate as opposed to needing the be nested functions. There appeared to be a pair of redundant customIfs for s128 sizes (assuming only scalars are supported).	2024-10-16 19:26:57 +01:00
David Green	a3010c7791	[GlobalISel] Add boolean predicated legalization action methods. (#111287 ) Under AArch64 it is common and will become more common to have operation legalization rules dependant on a feature of the architecture. For example HasFP16 or the newer CSSC integer min/max instructions, among many others. With the current legalization rules this either means adding a custom predicate based on the feature as in `legalIf([=](const LegalityQuery &Query) { return HasFP16 && ...; }` or splitting the legalization rules into pieces that place rules optionally into them base on the features available. This patch proposes an alternative where the existing routines like legalFor(..) are provided a boolean predicate, which if false skips adding the rule. It makes the rules cleaner and will hopefully allow them to scale better as we add more features. The SVE predicates for loads/stores I have changed to just be always available. Scalable vectors without SVE have never been supported, but it could also add a condition.	2024-10-16 14:43:28 +01:00
Sander de Smalen	11ed7f2d3c	[AArch64] NFC: Regenerate aarch64-sve-asm.ll It should use update_mir_test_checks.py instead of update_llc_test_checks.py.	2024-10-16 12:38:55 +00:00
Lewis Crawford	f5f00764ab	[DAGCombiner] Fix check for extending loads (#112182 ) Fix a check for extending loads in DAGCombiner, where if the result type has more bits than the loaded type it should count as an extending load. All backends apart from AArch64 ignore this ExtTy argument to shouldReduceLoadWidth, so this change currently only impacts AArch64.	2024-10-16 13:23:46 +01:00
Benjamin Maxwell	4c28d21f6a	[AArch64] Avoid single-element vector fp converts in streaming[-compatible] functions (#112213 ) The single-element vector variants of FCVTZS, FCVTZU, UCVTF, and SCVTF are only supported in streaming[-compatible] functions with `+sme2p2`. Reference: - https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/FCVTZS--vector--integer---Floating-point-convert-to-signed-integer--rounding-toward-zero--vector-- - https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/UCVTF--vector--integer---Unsigned-integer-convert-to-floating-point--vector-- - https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/SCVTF--vector--integer---Signed-integer-convert-to-floating-point--vector-- Codegen will be improved in follow up patches.	2024-10-16 10:00:49 +01:00

1 2 3 4 5 ...

8305 Commits