llvm-project

Author	SHA1	Message	Date
Dinar Temirbulatov	990c4bc95f	[AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (#83514 ) Allow to fold or/and-and to BSL instuction for scalable vectors.	2024-04-10 11:07:59 +01:00
Sizov Nikita	d38bff460a	[AArch64] SimplifyDemandedBitsForTargetNode - add AArch64ISD::BICi handling (#76644 ) Fold BICi if all destination bits are already known to be zeroes ```llvm define <8 x i16> @haddu_known(<8 x i8> %a0, <8 x i8> %a1) { %x0 = zext <8 x i8> %a0 to <8 x i16> %x1 = zext <8 x i8> %a1 to <8 x i16> %hadd = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %x0, <8 x i16> %x1) %res = and <8 x i16> %hadd, <i16 511, i16 511, i16 511, i16 511,i16 511, i16 511, i16 511, i16 511> ret <8 x i16> %res } declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>) ``` ``` haddu_known: // @haddu_known ushll v0.8h, v0.8b, #0 ushll v1.8h, v1.8b, #0 uhadd v0.8h, v0.8h, v1.8h bic v0.8h, #254, lsl #8 <-- this one will be removed as we know high bits are zero extended ret ``` Fixes #53881 Fixes #53622	2024-04-06 21:41:24 +01:00
Daniel Paoliello	43ba568daa	Prepend all library intrinsics with `#` when building for Arm64EC (#87542 ) While attempting to build some Rust code, I was getting linker errors due to missing functions that are implemented in `compiler-rt`. Turns out that when `compiler-rt` is built for Arm64EC, all its function names are mangled with the leading `#`. This change removes the hard-coded list of library-implemented intrinsics to mangle for Arm64EC, and instead assumes that they all must be mangled.	2024-04-05 12:06:47 -07:00
Paul Kirth	f0724f0704	[llvm][NFC] Update URL in comment about Android ABI The previous URL was stale, and referenced 'master' instead of 'main', which will never be updated. Reviewers: topperc, enh-google Reviewed By: enh-google Pull Request: https://github.com/llvm/llvm-project/pull/87726	2024-04-05 09:41:53 -07:00
Prabhuk	212b1a84a6	[CallSiteInfo][NFC] CallSiteInfo -> CallSiteInfo.ArgRegPairs (#86842 ) CallSiteInfo is originally used only for argument - register pairs. Make it struct, in which we can store additional data for call sites. Also, the variables/methods used for CallSiteInfo are named for its original use case, e.g., CallFwdRegsInfo. Refactor these for the upcoming use, e.g. addCallArgsForwardingRegs() -> addCallSiteInfo(). An upcoming patch will add type ids for indirect calls to propogate them from middle-end to the back-end. The type ids will be then used to emit the call graph section. Original RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html Updated RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html Differential Revision: https://reviews.llvm.org/D107109?id=362888 Co-authored-by: Necip Fazil Yildiran <necip@google.com>	2024-04-02 13:05:16 -07:00
Sander de Smalen	f914e8e77c	[AArch64][SME] Add coalescer barrier for args/results in locally streaming functions. (#85388 ) Similar to how we protected FP/fixed-vector arguments and results from calls, we should do the same for arguments/results from locally-streaming functions such that those are not spilled/filled as ZPR registers. This may cause a small regression (additional spills/fills), which is addressed by #85386.	2024-03-26 11:40:31 +00:00
David Green	96819daa3d	[AArch64] Handle v2i16 and v2i8 in concat load combine. (#86264 ) This extends the concat load patch from https://reviews.llvm.org/D121400, which was later moved to a combine, to handle v2i8 and v2i16 concat loads too.	2024-03-25 17:10:23 +00:00
Graham Hunter	36a3f8f647	[TTI][TLI][AArch64] Support scalable immediates with isLegalAddImmediate (#84173 ) Adds a second parameter (default to 0) to isLegalAddImmediate, to represent a scalable immediate. Extends the AArch64 implementation to match immediates based on what addvl and inc[h\|w\|d] support.	2024-03-20 10:28:46 +00:00
Graham Hunter	cd768ec983	[AArch64] Support scalable offsets with isLegalAddressingMode (#83255 ) Allows us to indicate that an addressing mode featuring a vscale-relative immediate offset is supported.	2024-03-20 10:13:20 +00:00
Takuya Shimizu	65058a8d73	[AArch64][SelectionDAG] Expand v1f64-typed sin,cos,pow,log,exp intrinsics (#83745 ) This patch makes NEON-enabled AArch64 backend expand the `sin, cos, pow, log, log2, log10, exp, exp2, exp10` intrinsics for `v1f64` data type, all of which caused selection failure before this patch. Fixes https://github.com/llvm/llvm-project/issues/83729	2024-03-17 11:28:32 +09:00
Sander de Smalen	e639e7e986	[AArch64] NFC: Simplify the smstart/smstop pseudo. (#85067 ) This is just a bit of cleanup to make the pseudo/code easier to understand. This is based on the observation that we only need to pass in a runtime value for 'pstate' if is actually needed for generating a runtime check.	2024-03-15 08:50:58 +00:00
Usman Nadeem	0b46884036	Revert "Revert "[AArch64] Improve lowering of truncating uzp1"" (#85119 ) Reverts llvm/llvm-project#85115 The fix was already merged in `79cd2c0bb9`	2024-03-13 11:58:10 -07:00
Mehdi Amini	06e310fee1	Revert "[AArch64] Improve lowering of truncating uzp1" (#85115 ) Reverts llvm/llvm-project#82457 The bot is broken, likely because of mid-air collision.	2024-03-13 11:32:53 -07:00
Usman Nadeem	57b991ab39	[AArch64] Improve lowering of truncating uzp1 (#82457 ) There were two existing patterns: `concat_vectors(trunc(x), trunc(y)) -> uzp1(x, y)` `concat_vectors(assertzext(trunc(x)), assertzext(trunc(y))) -> uzp1(x, y)` Move them into a class and add the following `assertsext` pattern to it: `concat_vectors(assertsext(trunc(x)), assertsext(trunc(y))) -> uzp1(x, y)` Add the following transform for v8i8 and v4i16 result types to help with pattern matching: `truncating uzp1(x, y) -> trunc(concat(x, y))` And a pattern to go with it: `trunc(concat_vectors(x, y)) -> uzp1 (x, y)` Add another isel pattern for v8i8 and v4i16 result vector types, similar to the existing concat pattern, but with a trunc node in the begining: `trunc(concat_vectors(assertext_trunc(x), assertext_trunc(y))) -> xtn(uzp1(x, y))`	2024-03-13 09:05:55 -07:00
Sander de Smalen	e42e97a4ad	[AArch64][SME] Don't mark 'smstart za' as using/defining VG. (#84775 ) VG is only used/defined when changing the streaming mode, using 'smstart sm' or plainly 'smstart' (same for smstop).	2024-03-13 08:21:33 +00:00
David Majnemer	edc1c3d24e	[AArch64] Make more vector f16 operations legal v8f16 is a legal type but promoting to v16f16 would result in an illegal type. Let's legalize these by a combination of splitting+promoting resulting in a pair of v4f16. Also, we were being overly cautious with different v4f16 nodes. Mark more of them safe to promote to v4f32.	2024-03-08 19:52:54 +00:00
David Majnemer	5f935e9181	[AArch64] Optimize fp64 <-> fp16 SIMD conversions Legalization would result in needless scalarization. Add some DAGCombines to fix this up.	2024-03-08 19:52:53 +00:00
David Majnemer	9e759f3523	[AArch64] Fix fptoi/itofp for bf16 There were a number of issues that needed to be addressed: - i64 to bf16 did not correctly round - strict rounding needed to yield a chain - fastisel did not have logic to bail on bf16	2024-03-06 06:17:39 +00:00
Fangrui Song	201572e34b	[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel Clang sets the nonlazybind attribute for certain ObjC features. The AArch64 SelectionDAG implementation for non-intrinsic calls (46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option. GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also sets the nonlazybind attribute. For SelectionDAG, make the cl option not affect ELF so that non-intrinsic calls to a dso_preemptable function use GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls. For FastISel, change `fastLowerCall` to bail out when a call is due to -fno-plt. For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall and intrinsic calls in AArch64CallLowering::lowerCall (where the target-independent CallLowering::lowerCall is not called). The GlobalISel test in `call-rv-marker.ll` is therefore updated. Note: the current -fno-plt -fpic implementation does not use GOT for a preemptable function. Link: #78275 Pull Request: https://github.com/llvm/llvm-project/pull/78890	2024-03-05 13:55:29 -08:00
Benjamin Kramer	8cc8fdaf5c	[AArch64] Also promote vector bf16 INT_TP_FP to f32 This mirrors the scalar version.	2024-03-04 23:34:56 +01:00
David Majnemer	930e7ff9ae	[AArch64] Optimize abs, neg and copysign for fp16/bf16 We can use bitwise arithmetic to implement these, making them considerably faster than legalization via promotion.	2024-03-04 20:05:05 +00:00
David Majnemer	23bc5b6392	[AArch64] Mark bf16 as custom for truncating stores & add a comment While we don't use SVE2 as a fallback for missing NEON instructions for BF16, it is confusing to break symmetry with fp16. While we are here, add a comment explaining how BF16 immediates work.	2024-03-04 06:33:25 +00:00
David Majnemer	3dd6750027	[AArch64] Add more complete support for BF16 We can use a small amount of integer arithmetic to round FP32 to BF16 and extend BF16 to FP32. While a number of operations still require promotion, this can be reduced for some rather simple operations like abs, copysign, fneg but these can be done in a follow-up. A few neat optimizations are implemented: - round-inexact-to-odd is used for F64 to BF16 rounding. - quieting signaling NaNs for f32 -> bf16 tries to detect if a prior operation makes it unnecessary.	2024-03-03 22:39:50 +00:00
David Green	0e9a102129	[AArch64] Remove unused AArch64ISD::BIT. NFC These were last used in the fcopysign lowering, which now uses AArch64ISD::BSP.	2024-03-01 11:44:58 +00:00
Sander de Smalen	5bd01ac822	[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235 ) We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.	2024-02-29 15:35:46 +00:00
Lukacma	26402777eb	[AArch64] Optimized generated assembly for bool to svbool_t conversions (#83001 ) In certain cases Legalizer was generating `AND(WHILELO, SPLAT 1)` instruction pattern, when `WHILELO` would be sufficient.	2024-02-28 16:45:39 +00:00
Sander de Smalen	41427b0e8e	[AArch64] Disable FastISel/GlobalISel for ZT0 state (#82768 ) For __arm_new("zt0") we need to have special setup code in the prologue. For calls that don't preserve zt0, we need to emit code preserve ZT0 around the call. This is only emitted by SelectionDAG ISel at the moment.	2024-02-28 10:42:16 +00:00
Nashe Mncube	744c0057e7	[AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr (#81724 ) When performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled.	2024-02-22 19:15:52 +00:00
Dinar Temirbulatov	5a023f564f	[AArch64][SVE2] Enable dynamic shuffle for fixed length types. (#72490 ) When SVE register size is unknown or the minimal size is not equal to the maximum size then we could determine the actual SVE register size in the runtime and adjust shuffle mask in the runtime.	2024-02-21 14:59:47 +00:00
David Green	6c84709eff	[AArch64] Materialize constants via fneg. (#80641 ) This is something that is already done as a special case for copysign, this patch extends it to be more generally applied. If we are trying to matrialize a negative constant (notably -0.0, 0x80000000), then there may be no movi encoding that creates the immediate, but a fneg(movi) might. Some of the existing patterns for RADDHN needed to be adjusted to keep them in line with the new immediates.	2024-02-14 13:55:51 +00:00
Usman Nadeem	44d85c5b15	[AArch64][SVE2] Use a PatFrag for URSHR (#81304 ) Follow-up for #78374	2024-02-12 09:03:41 -08:00
Nikita Popov	92d7992205	[AArch64] Only apply bool vector bitcast opt if result is scalar (#81256 ) This optimization tries to optimize bitcasts from `<N x i1>` to iN, but currently also triggers for `<N x i1>` to `<M x iK>` bitcasts, if custom lowering has been requested for these for an unrelated reason. Fix this by explicitly checking that the result type is scalar. Fixes https://github.com/llvm/llvm-project/issues/81216.	2024-02-12 10:00:34 +01:00
ostannard	5452cbc4a6	[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020 ) When using -mbranch-protection=pac-ret+pc, x16 is used in the function epilogue to hold the address of the signing instruction. This is used by a HINT instruction which can only use x16, so we can't change this. This means that we can't use it to hold the function pointer for an indirect tail-call. There is existing code to force indirect tail-calls to use x16 or x17 when BTI is enabled, so there are now 4 combinations: bti pac-ret+pc Valid function pointer registers off off Any non callee-saved register on off x16 or x17 off on Any non callee-saved register except x16 on on x17	2024-02-08 15:31:54 +00:00
Rin Dobrescu	7f292b8fb1	[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d)) (#80674 ) We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16 uhadd(concat(a,c), concat(b,d)), which can lead to further simplifications.	2024-02-06 11:02:06 +00:00
Fangrui Song	d4de4c3eaf	[AArch64] Support optional constant offset for constraint "S" (#80255 ) Modify the initial implementation (https://reviews.llvm.org/D46745) to support a constant offset so that the following code will compile: ``` int a[2][2]; void foo() { asm("// %0" :: "S"(&a[1][1])); } ``` We use the generic code path for "s". In GCC's aarch64 port, "S" is supported for PIC while "s" isn't, making "s" less useful. We implement "S" but not "s". Similar to #80201 for RISC-V.	2024-02-02 10:33:09 -08:00
Matthew Devereau	d9c20e437f	[AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276 ) This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: https://github.com/ARM-software/acle/pull/276	2024-02-02 08:12:05 +00:00
Usman Nadeem	1d1432356e	[AArch64][SVE2] Generate urshr rounding shift rights (#78374 ) Add a new node `AArch64ISD::URSHR_I_PRED`. `srl(add(X, 1 << (ShiftValue - 1)), ShiftValue)` is transformed to `urshr`, or to `rshrnb` (as before) if the result it truncated. `uzp1(rshrnb(uunpklo(X),C), rshrnb(uunpkhi(X), C))` is converted to `urshr(X, C)` (tested by the wide_trunc tests). Pattern matching code in `canLowerSRLToRoundingShiftForVT` is taken from prior code in rshrnb. It returns true if the add has NUW or if the number of bits used in the return value allow us to not care about the overflow (tested by rshrnb test cases).	2024-01-31 14:03:58 -08:00
Rin Dobrescu	2907c63311	Revert "[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d))" (#80157 ) Reverts llvm/llvm-project#79464 while figuring out why the tests are failing.	2024-01-31 16:45:25 +00:00
Rin Dobrescu	cf828aee24	[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d)) (#79464 ) We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16 uhadd(concat(a,c), concat(b,d)), which can lead to further simplifications.	2024-01-31 12:52:12 +00:00
Sander de Smalen	dd73666182	[SME] Stop RA from coalescing COPY instructions that transcend beyond smstart/smstop. (#78294 ) This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that expands to a 'nop', but which stops the register allocator from coalescing a COPY node when its use/def crosses a SMSTART or SMSTOP instruction. For example: %0:fpr64 = COPY killed $d0 undef %2.dsub:zpr = COPY %0 // <- Do not coalesce this COPY ADJCALLSTACKDOWN 0, 0 MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0 $d0 = COPY killed %0 BL @use_f64, csr_aarch64_aapcs If the COPY would be coalesced, that would lead to: $d0 = COPY killed %0 being replaced by: $d0 = COPY killed %2.dsub which means the whole ZPR reg would be live upto the call, causing the MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register: str q0, [sp] // 16-byte Folded Spill smstop sm ldr z0, [sp] // 16-byte Folded Reload bl use_f64 which would be incorrect for two reasons: 1. The program may load more data than it has allocated. 2. If there are other SVE objects on the stack, the compiler might use the 'mul vl' addressing modes to access the spill location. By disabling the coalescing, we get the desired results: str d0, [sp, #8] // 8-byte Folded Spill smstop sm ldr d0, [sp, #8] // 8-byte Folded Reload bl use_f64	2024-01-31 09:04:13 +00:00
Billy Laws	c761b4a5e4	[AArch64] Fix variadic tail-calls on ARM64EC (#79774 ) ARM64EC varargs calls expect that x4 = sp at entry, special handling is needed to ensure this with tail calls since they occur after the epilogue and the x4 write happens before. I tried going through AArch64MachineFrameLowering for this, hoping to avoid creating the dummy object but this was the best I could do since the stack info that uses isn't populated at this stage, CreateFixedObject also explicitly forbids 0 sized objects.	2024-01-30 18:32:15 -08:00
Florian Hahn	d1e162e5d9	[AArch64] Add custom lowering for load <3 x i8>. (#78632 ) Add custom combine to lower load <3 x i8> as the more efficient sequence below: ldrb wX, [x0, #2] ldrh wY, [x0] orr wX, wY, wX, lsl #16 fmov s0, wX At the moment, there are almost no cases in which such vector operations will be generated automatically. The motivating case is non-power-of-2 SLP vectorization: https://github.com/llvm/llvm-project/pull/77790	2024-01-30 14:04:27 +00:00
David Green	9520773c46	[AArch64] Don't generate neon integer complex numbers with +sve2. NFC (#79829 ) The condition for allowing integer complex number support could also allow neon fixed length complex numbers if +sve2 was specified. This tightens the condition to only allow integer complex number support for scalable vectors. We could generalize this in the future to generate SVE intrinsics for fixed-length vectors, but for the moment this opts for the simpler fix.	2024-01-29 16:46:22 +00:00
Kazu Hirata	8f8cab6b78	[llvm] Use Instruction::hasMetadata (NFC)	2024-01-27 22:20:22 -08:00
Eli Friedman	bee1557ffc	[NFC][AArch64] Fix indentation.	2024-01-26 10:26:19 -08:00
Florian Hahn	eb678d8993	[AArch64] Combine store (trunc X to <3 x i8>) to sequence of ST1.b. (#78637 ) Improve codegen for (trunc X to <3 x i8>) by converting it to a sequence of 3 ST1.b, but first converting the truncate operand to either v8i8 or v16i8, extracting the lanes for the truncate results and storing them. At the moment, there are almost no cases in which such vector operations will be generated automatically. The motivating case is non-power-of-2 SLP vectorization: https://github.com/llvm/llvm-project/pull/77790 PR: https://github.com/llvm/llvm-project/pull/78637	2024-01-25 18:28:44 +00:00
Nico Weber	184ca39529	[llvm] Move CodeGenTypes library to its own directory (#79444 ) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.	2024-01-25 12:01:31 -05:00
Eli Friedman	a6065f0fa5	Arm64EC entry/exit thunks, consolidated. (#79067 ) This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)	2024-01-22 21:28:07 -08:00
Rin Dobrescu	365aa1574a	[AArch64] Convert UADDV(add(zext, zext)) into UADDLV(concat). (#78301 ) We can convert a UADDV(add(zext(64-bit source), zext(64-bit source))) into UADDLV(concat), where the concat represents the 64-bit zext sources.	2024-01-22 11:59:40 +00:00
Kerry McLaughlin	a8a3711e74	[AArch64][SME2] Preserve ZT0 state around function calls (#78321 ) If a function has ZT0 state and calls a function which does not preserve ZT0, the caller must save and restore ZT0 around the call. If the caller shares ZT0 state and the callee is not shared ZA, we must additionally call SMSTOP/SMSTART ZA around the call. This patch adds new AArch64ISDNodes for spilling & filling ZT0. Where requiresPreservingZT0 is true, ZT0 state will be preserved across a call.	2024-01-20 12:06:00 +00:00

1 2 3 4 5 ...

1915 Commits