llvm-project

Author	SHA1	Message	Date
Mirko	4d3c427f33	[CodeGen] Use first EHLabel as a stop gate for live range shrinking (#114195 ) This fixes issue #114194 The issue happens during the `LiveRangeShrink` pass, which runs early, before phi elimination. LandingPads, which are lowered to EHLabels, need to be the first non phi instruction in an EHPad. In case of a phi node being in front of the EHLabel and a use being after the EHLabel, we hoist the use in front of the label. This results in a portion of the landingpad missing due to being hoisted in front of the label.	2024-11-01 19:13:18 -07:00
Alex MacLean	8ff60c4d47	[NVPTX] Add support for nvvm.flo.[us] intrinsics (#114489 ) Add support for '`llvm.nvvm.flo.[su].*`' intrinsics which correspond to a PTX `bfind` instruction. See [PTX ISA 9.7.1.16. Integer Arithmetic Instructions: bfind] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfind) The '`llvm.nvvm.flo.u`' family of intrinsics identifies the bit position of the leading one, returning either it's offset from the most or least significant bit. The '`llvm.nvvm.flo.s`' family of intrinsics identifies the bit position of the leading non-sign bit, returning either it's offset from the most or least significant bit.	2024-11-01 16:35:43 -07:00
Alex MacLean	57183b6fe1	[NVPTX] Add support for stacksave, stackrestore intrinsics (#114484 ) Add support for the '`@llvm.stacksave`' and '`@llvm.stackrestore`' intrinsics to NVPTX. These are implemented with the `stacksave` and `stackrestore` PTX instructions respectively. See [PTX ISA 9.7.17. Stack Manipulation Instructions] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions).	2024-11-01 13:26:28 -07:00
Matt Arsenault	8e61aaa021	AMDGPU: Fix illegal commute with frame index (#114497 ) In ca409892c5396fa3fbb8ea4dbf53d0e952f36d09, frame indexes started being treated more like registers, rather than immediates. Update the commute logic to avoid failing the verifier by moving illegal SGPR operands in place of a frame index.	2024-11-01 10:02:29 -07:00
Luke Lau	faa385a9f4	[RISCV] Add tests for length changing shuffles Tests taken from Luke's 88147 with minimal changes by me (preames). The main case of interest here is when mask length is less than source length (i.e. length is decreasing). We often scalarize these, which on RISCV can be quite painful.	2024-11-01 09:51:39 -07:00
Krzysztof Drewniak	ea33af63de	Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714 )" v3 (#114443 ) This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b. It seems I missed a spot when trying to ensure the code in the instruction selection tests were actually legalized MIR.	2024-11-01 11:13:29 -05:00
peterbell10	b74e588e1f	[NVPTX] Don't use stack memory when bitcasting to/from v2i8 (#113928 ) `v2i8` is an unsupported type, so we hit the default legalization rules which perform the bitcast in stack memory and is very inefficient on GPU. This adds a custom lowering where we pack `v2i8` into `i16` and from there use another bitcast node to reach the final desired type. And also the inverse unpacking `i16` into `v2i8`.	2024-11-01 08:02:43 -07:00
Philip Reames	58f525a23c	[RISCV] Add tests for deinterleave shuffles w/o vnsrl.vv With SEW=64, the vnsrl trick we primary rely on does not work. This is handled correctly today, but we have fairly minimal testing of the resulting shuffles which makes it hard to demonstrate value of an upcoming change.	2024-11-01 08:02:04 -07:00
Oliver Stannard	33411d5207	[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433 ) When doing a call from CMSE secure state to non-secure state for v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and restore the FP registers around the call. These instructions both check the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents of the FP registers are not secret) they execute as no-ops. This causes a problem when CONTROL_S.SFPA==0 before the call, which happens if there are no floating-point instructions executed between entry to secure state and the call. If this is the case, then the VLSTM instruction will do nothing, leaving the save area in the stack uninitialised. If the called function returns a value in floating-point registers, the call sequence includes an instruction to copy the return value from a floating-point register to a GPR, which must be before the VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM will fully execute, and load the uninitialised stack memory into the FP registers. This causes two problems: * The FP register file is clobbered, including all of the callee-saved registers, which might contain live values. * The stack region might contain secret values, which will be leaked to non-secure state through the floating-point registers if/when we return to non-secure state. The fix is to insert a `vmov s0, s0` instruction before the VLSTM instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and VLSTM instruction. CVE: https://www.cve.org/cverecord?id=CVE-2024-7883 Security bulletin: https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability	2024-11-01 09:36:13 +00:00
Daniil Kovalev	da083e358e	[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811 ) This re-applies #96164 after revert in #102434. Support the following relocations and assembly operators: - `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`) - `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`) `LOADgotAUTH` pseudo-instruction is introduced which is later expanded to actual instruction sequence like the following. ``` adrp x16, :got_auth:sym add x16, x16, :got_auth_lo12:sym ldr x0, [x16] autia x0, x16 ``` If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT load is lowered similarly to `LOADgotAUTH`. ``` @var = global i32 0 define ptr @resign_globalvar() { ret ptr ptrauth (ptr @var, i32 3, i64 43) } ``` If FPAC bit is not set and auth instruction is emitted, a check+trap sequence similar to one used for `AUT` pseudo is emitted to ensure auth success. Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. See also specification https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got	2024-11-01 12:21:10 +03:00
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Thorsten Schütt	8e3772744d	[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470 ) There are patterns for: * {nxv2s32, s32, s64}, * {nxv4s16, s16, s64}, * {nxv2s16, s16, s64}	2024-11-01 06:10:26 +01:00
Hervé Poussineau	6fa1647a47	[MC][Mips] Rename MipsMCAsmInfo to MipsELFMCAsmInfo (#112592 ) Also change MipsAsmPrinter::emitStartOfAsmFile to emit ELF-related sections only when using ELF output file format.	2024-11-01 08:42:34 +08:00
Ruiling, Song	54d31bde32	Reapply "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301 )" (#114347 ) This reverts commit be40c723ce2b7bf2690d22039d74d21b2bd5b7cf.	2024-11-01 08:29:59 +08:00
Justin Fargnoli	a1987beac5	Reland "[NVPTX] Prefer prmt.b32 over bfi.b32" (#114326 ) Fix [failure](https://github.com/llvm/llvm-project/pull/110766#discussion_r1796832635) identified by @akuegel. --- In [[NVPTX] Improve lowering of v4i8](`cbafb6f2f5`) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: (https://github.com/llvm/llvm-project/pull/67866#discussion_r1343066911) Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.	2024-10-31 16:09:20 -07:00
Thorsten Schütt	aa70d846b0	[GlobalISel][AArch64] Legalize G_SPLAT_VECTOR (#114006 ) {nxv8s16, s16} fails to select. {nxv16s8, s8} no patterns available.	2024-10-31 22:20:08 +01:00
zhijian lin	674574d25c	Promote 32bit pseudo instr that infer extsw removal to 64bit in PPCMIPeephole (#85451 ) Fixes: https://github.com/llvm/llvm-project/issues/71030 Bug only happens in 64bit involving spills. Since we don't know when the spill will happen, all instructions in the chain used to deduce sign extension for eliminating 'extsw' will need to be promoted to 64-bit pseudo instructions. The following instruction will promoted in PPCMIPeepholes: EXTSH, LHA, ISEL to EXTSH8, LHA8, ISEL8	2024-10-31 15:49:36 -04:00
Matt Arsenault	e3222e6f80	AMDGPU: Add baseline tests for cmpxchg custom expansion (#109408 ) We need a non-atomic path if flat may access private.	2024-10-31 11:46:13 -07:00
Zaara Syeda	155956834a	Fix failing test gcov_ctr_ref_init.ll (#114428 ) Fix test failing after merge of commit: ccddd136024305be8b6aa77e4ce576d8e6521529	2024-10-31 13:08:16 -04:00
Simon Pilgrim	9fb4bc5bf4	[DAG] SimplifyMultipleUseDemandedBits - ignore SRL node if we're just demanding known sign bits (#114389 ) Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node. We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.	2024-10-31 16:40:29 +00:00
WÁNG Xuěruì	f246b5f547	[LoongArch] Support bswap for LSX/LASX VTs (#114171 ) On top of #114170	2024-11-01 00:38:13 +08:00
hev	f7a96dc664	[LoongArch] Ensure pcaddu18i and jirl adjacency in tail calls for correct relocation (#113932 ) Prior to this patch, both `pcaddu18i` and `jirl` were marked as scheduling boundaries to prevent instruction reordering that would disrupt their adjacency. However, in certain cases, epilogues were still being inserted between these two instructions, breaking the required proximity. This patch ensures that `pcaddu18i` and `jirl` remain adjacent even in the presence of epilogues, maintaining correct relocation behavior for tail calls on LoongArch.	2024-11-01 00:08:15 +08:00
Zaara Syeda	ccddd13602	Enable aggressive constant merge in GlobalMerge for AIX (#113956 ) Enable merging all constants without looking at use in GlobalMerge by default to replace PPCMergeStringPool pass on AIX.	2024-10-31 11:22:48 -04:00
Matt Arsenault	1d0370872f	AMDGPU: Expand flat atomics that may access private memory (#109407 ) If the runtime flat address resolves to a scratch address, 64-bit atomics do not work correctly. Insert a runtime address space check (which is quite likely to be uniform) and select between the non-atomic and real atomic cases. Consider noalias.addrspace metadata and avoid this expansion when possible (we also need to consider it to avoid infinitely expanding after adding the predication code).	2024-10-31 08:08:48 -07:00
Matt Arsenault	db5bcb24c2	GlobalISel: Fix combine duplicating atomic loads (#111730 ) The sext_inreg (load) combine was not deleting the old load instruction, and it would never be deleted if volatile or atomic.	2024-10-31 07:55:12 -07:00
Sander de Smalen	41448c1d07	[AArch64] NFC: Add RUN line for +sve2 for sve-intrinsics-perm-select.ll The codegen for SVE and SVE2 may be different (e.g. for splice and ext). A follow-up patch will improve codegen for EXT.	2024-10-31 14:46:00 +00:00
Matt Arsenault	12409024d3	AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721 ) Atomic loads are handled differently from the DAG, and have separate opcodes and explicit control over the extensions, like ordinary loads. Add new patterns for these. There's room for cleanup and improvement. d16 cases aren't handled. Fixes #111645	2024-10-31 07:44:52 -07:00
WÁNG Xuěruì	5581e43a2b	[LoongArch][NFC] Pre-commit tests for LSX/LASX bswap codegen (#114170 )	2024-10-31 21:10:26 +08:00
Nathan Gauër	cf3d6fded9	[SPIR-V] Re-enable -verify-machineinstrs on tests (#114388 ) Many tests had this flag removed because of the G_BITCAST emission issue. Now that the PR is merged, we can re-enable this additional check. 2 tests (basic_int_types) just have the TODO removed because they are not useful for SPIR-V as-is: SPIR-V requires reg2mem/mem2reg to run, which removes all the body. Integers are used in other spirv tests, and seems like testing for spirv32/64 and relying on others for the logical target coverage should be fine. Signed-off-by: Nathan Gauër <brioche@google.com>	2024-10-31 13:55:30 +01:00
Benjamin Maxwell	89a8c71db6	[SDAG] Support expanding `FSINCOS` to vector library calls (#114039 ) This shares most of its code with the scalar sincos expansion. It allows expanding vector FSINCOS nodes to a library call from the specified `-vector-library`. The upside of this is it will mean the vectorizer only needs to handle the sincos intrinsic, which has no memory effects, and this can handle lowering the intrinsic to a call that takes output pointers.	2024-10-31 12:41:43 +00:00
Pengcheng Wang	18f0f70934	[RISCV] Support llvm.masked.expandload intrinsic (#101954 ) We can use `viota`+`vrgather` to synthesize `vdecompress` and lower expanding load to `vcpop`+`load`+`vdecompress`. And if `%mask` is all ones, we can lower expanding load to a normal unmasked load. Fixes #101914.	2024-10-31 20:03:58 +08:00
SpencerAbson	0800351da4	[AArch64][SVE] Use INS when moving elements from bottom 128b of SVE type (#114034 ) Moving elements from a scalable vector to a fixed-lengh vector should use[ INS (vector, element) ](https://developer.arm.com/documentation/100069/0606/SIMD-Vector-Instructions/INS--vector--element-) when we know that the extracted element is in the bottom 128-bits of the scalable vector. This avoids inserting unecessary UMOV/FMOV instructions.	2024-10-31 10:36:00 +00:00
dnsampaio	28d0718033	[DAGCombiner] Add combine avg from shifts (#113909 ) This teaches dagcombiner to fold: `(asr (add nsw x, y), 1) -> (avgfloors x, y)` `(lsr (add nuw x, y), 1) -> (avgflooru x, y)` as well the combine them to a ceil variant: `(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` `(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)` iff valid for the target. Removes some of the ARM MVE patterns that are now dead code. It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the immediate splatting as before.	2024-10-31 10:57:27 +01:00
WANG Rui	862074fa57	[LoongArch][NFC] Pre-commit tests for the adjacency of expanded pseudo-insns	2024-10-31 16:59:41 +08:00
Stanislav Mekhanoshin	7cd29741fa	[AMDGPU] Extend mov_dpp8 intrinsic lowering for generic types (#114296 ) The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32. To allow a corresponding builtin to be overloaded the same way as int_amdgcn_mov_dpp we need it to be able to split unsupported values.	2024-10-31 01:15:25 -07:00
Ami-zhang	1897bf61f0	[LoongArch] Enable FeatureExtLSX for generic-la64 processor (#113421 ) This commit makes the `generic` target to support FP and LSX, as discussed in #110211. Thereby, it allows 128-bit vector to be enabled by default in the loongarch64 backend.	2024-10-31 15:58:15 +08:00
Luke Lau	6da5968f5e	[RISCV] Lower scalar_to_vector for supported FP types (#114340 ) In https://reviews.llvm.org/D147608 we added custom lowering for integers, but inadvertently also marked it as custom for scalable FP vectors despite not handling it. This adds handling for floats and marks it as custom lowered for fixed-length FP vectors too. Note that this doesn't handle bf16 or f16 vectors that would need promotion, but these scalar_to_vector nodes seem to be emitted when expanding them.	2024-10-31 13:15:17 +08:00
Thorsten Schütt	6effab990c	Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353 ) Reverts llvm/llvm-project#114310	2024-10-31 05:41:16 +01:00
Thorsten Schütt	6bf214b7c6	[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310 ) There are patterns for: * {nxv2s32, s32, s64}, * {nxv4s16, s16, s64}, * {nxv2s16, s16, s64}	2024-10-31 04:56:41 +01:00
Adam Yang	948249d804	Revert "[DXIL] Add GroupMemoryBarrierWithGroupSync intrinsic" (#114322 ) Reverts llvm/llvm-project#111884	2024-10-30 20:44:54 -07:00
Feng Zou	8127162427	[X86][AMX] Support AMX-FP8 (#113850 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-31 10:14:25 +08:00
Luke Lau	1cb599835c	[RISCV] Remove redundant +zfh from +zvfh[min] tests. NFC In the vast majority of f16 tests we don't end up emitting any scalar code that needs +zfh, so remove it.	2024-10-31 06:51:39 +08:00
Vyacheslav Levytskyy	c616f24bcb	[SPIR-V] Do instruction selection for G_BITCAST on an earlier stage (#114216 ) This PR implements instruction selection for G_BITCAST on an earlier stage to avoid MachineVerifier complains on subtle semantics difference between G_BITCAST and OpBitcast. We do instruction selections for OpBitcast after IR Translation instead of calling MIB.buildBitcast() generating the general op code G_BITCAST, because when MachineVerifier validates G_BITCAST we see a check of a kind: 'if Source Type is equal to Destination Type then report error "bitcast must change the type"'. This doesn't take into account the notion of a typed pointer that is important for SPIR-V where a user may and should use bitcast between pointers with different pointee types (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast). It's important for correct lowering in SPIR-V, because interpretation of the data type is not left to instructions that utilize the pointer, but encoded by the pointer declaration, and the SPIRV target can and must handle the declaration and use of pointers that specify the type of data they point to. It's not feasible to improve validation of G_BITCAST using just information provided by low level types of source and destination. Therefore we don't produce G_BITCAST as the general op code with semantics different from OpBitcast, but rather lower to OpBitcast immediately. See discussion in https://github.com/llvm/llvm-project/pull/110270 for even more context.	2024-10-30 20:49:21 +01:00
Steven Perron	d8295e2eec	[SPIRV][HLSL] Handle arrays of resources (#111564 ) This commit adds the ability to get a particular resource from an array of resources using the handle_fromBinding intrinsic. The main changes are: 1. Create an array when generating the type. 2. Add capabilities from [SPV_EXT_descriptor_indexing](https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_descriptor_indexing.html). We are still missing the ability to declare a runtime array. That will be done in a follow up PR.	2024-10-30 15:01:02 -04:00
Thorsten Schütt	b3bb6f18bb	[GlobalISel] Import samesign flag (#114267 ) Credits: https://github.com/llvm/llvm-project/pull/111419 Fixes icmp-flags.mir First attempt: https://github.com/llvm/llvm-project/pull/113090 Revert: https://github.com/llvm/llvm-project/pull/114256	2024-10-30 19:56:25 +01:00
Craig Topper	408c84f35b	[RISCV] Add hasPostISelHook to sf.vfnrclip pseudo instructions. (#114274 ) Add Uses = [FRM] to the underlying MC instructions. Tweak a couple test cases so the MachineVerifier would have caught this.	2024-10-30 11:52:49 -07:00
Craig Topper	c3724ba866	[RISCV] Add OperandType for vector rounding mode operands. (#114179 ) Use TSFlags to distinquish which type of rounding mode it is. We use the same tablegen base classes for vxrm and frm sometimes so its hard to have different types for different instructions.	2024-10-30 11:46:15 -07:00
Changpeng Fang	ca1154d1d4	AMDGPU: Disable pattern matching "x<<32-y>>32-y" to "bfe x, 0, y" (#114279 ) It is not correct to lower "x<<32-y>>32-y" to "bfe x, 0, y". When y equals to 32, the left-hand side is still x (unchanged), however, the right-hand side will be evaluated to 0. So it is not always correct to do such transformation. We may be able to keep the pattern for immediate y while y is within [0, 31]. However, the immediate operands of the sub (32 - y) are easily folded, and "(x << imm) >> imm" will be lowered to "and x, (2^(32-imm))-1" anyway. So no bfe matching is needed.	2024-10-30 11:07:15 -07:00
Jay Foad	311c0772f9	[AMDGPU] Fix test failures after #114232 and #114200	2024-10-30 16:51:44 +00:00
Jay Foad	6bf4476ffb	[AMDGPU] Fix @llvm.amdgcn.cs.chain with callee not provably uniform (#114200 ) The correct behavior is to insert a readfirstlane. This worked except for an inappropriate assertion in SITargetLowering::LowerCall.	2024-10-30 16:18:29 +00:00

1 2 3 4 5 ...

55755 Commits